WikiToLearn migration, why?

Well, currently WikiToLearn runs on MediaWiki, which is a good model for dealing with an encyclopedia but, when you are trying to build a more structured content, it doesn’t fit.

For the release 1.0 we have developed CourseEditor, which tries to make the unstructured content more structured, for example offering a drag-and-drop UI to manage a course structure.

However, this isn’t enough: the issue with the MediaWiki data structure is that the versioning is only at the single page level, which is a good design for something that has little to none requirement of unambiguous references between pages, but it is a very big deal if you are talking about a course.

This is the main reason why we had to think about something new.

Because we need something new we have the opportunity to build something that make easier to add features, functions and capabilities, try new things without too much worry about the whole stack.

Thanks to a big support from GARR, we have access to a lot of computing power, not only in terms of raw CPU/RAM/Storage/Network but also in terms of number of servers (or VMs) and this means that we can build something distributed, and therefore having full tolerance and resilience.

The first thing that comes to mind in this scenario is: microservices! Microservices everywhere!

With containers, containers all around the place!

Yes, microservices, not in the “put a LAMP stack in docker” way of doing it, but with a proper stratified design.

MediaWiki is not designed with micro-services in mind and we have already mediawiki inside docker, we need to design your platform from scratch.

The first step for this ambitious project is the storage: we have to store the user data in a safe way, and this service is the critical one, the one that we can’t do wrong – there isn’t a second chance for doing it.

Now we are testing using Eve to store everything in an object storage with RESTful API, the backend at this time is MongoDB (with replication).

Now the very big issue is: how do we transform the MediaWiki data-structure in something with rigid internal references and course-wide versioning?

For this we used mongodb as temporary storage to work on the data and process every page to find and resolve every reference.

Now the migration is working quite well, it’s not done yet but we are confident that we can do the magic trick very soon.


MongoDB for WikiToLearn migration


Today i want to talk about my experience with the WikiToLearn migration.

The problem of every migration is getting your hands on the data in a way such that you can work on it.

Starting from the mysql backend and trying to have everything into a versioned object storage (python eve is the one we are tring now) is not an option.

The solution is to use a temporary database to keep the data, process the data in this temporary storage and afterwards uploading everything in the destination.

After some tries we managed to have the pipeline that reads all the MediaWiki pages, parses the structure and uploads everything in eve, using mongodb as a temporary storage.

But why mongo?

There are tons of databases and, after all, why use a dbms as temporary storage?

Well, the first thing is that mongodb is an implicit-schema dbms (there isn’t such thing as schema-less) and this is useful because you can add and remove fields at will without a full restructuration of the data-scheme and this can speed up quick hacks to test.

The point I want to make is that a DBMS is fast, you can try to build an in-memory representation of the data, I’ve tried, but it’s quite hard and quite slow or it’s another DBMS, so why not use an existing DBMS?

The next point is about persistence: when you have to work with a non trivial dataset, it is quite nice to be able to re-run only a part of the migration, as this speeds up the development.

Mongodb has also mongdump and mongorestore, which lets you snapshot everything and restore from a “checkpoint”.

I hope I’ve given you some good points to think about the next time you have to migrate from an old datastore to a new one.


Dumbing down every IT solution is dumb

We are seeing an everyday increasing process to make everything easy in the IT world.

Every “make your own website”, every “make high performance application with only a yaml” smells bad to me.

When you are making an IT product where lots of things are involved and you think, sometime, that putting a docker swarm in your application stack could solve every problem, maybe this is not the right way to face all the problems.

I think that we, as IT world builders, have to stop saying “it is easy”, because most of the time is not.

We have to start embracing the complexity of our product and stop trying to over-simplify everything on the shoulders of our users.

One example of this trend to oversimplify is when a webmaster does not implements the best practice about password storage and the best excuse is: this is only my website, isn’t an high profile target.

Yes, I know, your website has less than 50 users and so on and so forth.

This isn’t a valid reason to avoid implementing a valid software for storing users passwords because users are lazy and they mostly use the same password everywhere.

I can hear you complain with: “you have to use a strong password and don’t reuse the password” and so on …

This is a best practice, but a user is not a security expert. You have to improve and implement better solutions, is not a valid excuse that users are wrong to use the same passwords for the bank account or for a small website that hosts recipes. It’s your duty to protect the user.

This is the real world, you can not ignore these issues and pretend that the world is build on top of best practice and security experts.

Thanks guys, see you next time.

PKI is needed for micro-services


Today I want to explain why I think that for a proper micro-service software a PKI is needed.

First the problem: when you have a network application you need a way to authenticate one service to another and verify a service to another.

One way to do this is with usernames and passwords or tokens. This solution works well but there is an issue about where to store the secret data, how to deploy the secret data to all nodes in a secure way and how to revoke access to only one node.

When you are using only usernames/passwords or tokens, it is kind of a mess and you have to write everything to a config file. Revocation is not easy and needs good orchestration to avoid downtime.

PKI is a strong and standard way to have mutual authentication between two endpoints.

Managing a CA is not an easy task but the effort pays off if you care about security and you want to avoid a big spaghetti-style security approach.

Someone would say: but we can trust the source IP!
The short answer to this is: no.

The long answer is: no! no! no! no! no! no! no! no! no!

An IP address is not secure by design, the network can be manipulated quite easily with an L2 access (like one server compromised).

Also, the IP layer is not encrypted by default, so if you have to use some kind of encryption on top in your application, what’s the point of encrypting everything with a pre shared key when you can use an asymmetric layout?

I hope I’ve made my point and that you will use PKI for your next micro-service application.


The magic of ~/.ssh/config

Hi, today I want to talk about the ~/.ssh/config file.

First thing about this magic file: if you are using ssh you must have this file, this is a fact.

For example, i use git with ssh, because ssh is a very good protocol, and to use git over ssh we don’t use the TTY, so we can put in the config file something like:

 User git
 RequestTTY no

In this way I can execute


without seeing the annoing “PTY allocation request failed” message.

Another sorcery happens when you have multiple ssh keys, I have one key for each “scope”, for example: one for KDE, one for GitHub, one for GitLab, one for my home, ecc.

I don’t want to use the “-i” option each time to select the right key, this is why I use the IdentityFile option, for example applyed to “*”.

Sometime I have to connect to a server without direct access to the sshd daemon with a direct TCP connection, in this scenario i use the “ProxyCommand” option, this is a command to execute to proxy the ssh connection via another host, for example “ProxyCommand ssh bastion.<domain> nc %h %p”.

The last useful thing is that you can create an alias for an host, for example I have my server with FQDN “srv.domain.tdl” listening on port 1900 TCP, I can create an alias, like “srv” using something as:

Host srv
 HostName srv.domain.tdl
 Port 1900

With this config I can run “ssh srv” and be on my server.

Thanks for reading.

Micro-services are only half the picture


today I want to expose my my thoughts about the general hype for micro-services.

The first objection that one can move against this approach is that it does not really solve the problem of having the maintainable code because the same principles can be found in a lot of other paradigms that did not prevent bad software to be produced.

I believe that the turning point of the micro-services stuff is that is compatible with the devops philosophy.

With the combination of micro-services and devops you get software that has some reasonably well-defined limits and whose management is assigned to the people who developed the software.

This combination avoids of development shortcuts that make management more difficult (maintenance is a big deal).

This thing also solve one of the great IT open problems: the documentation.

It is true that it can not force us to produce documentation, but, at least, who run the code is exactly who produced it and i can guess that who writes the code knows how the code has to work.

It is now possible to build applications with high performance and functionality unimmaginable before, all this thanks to the fact that each component can be realized, evolved and delployed with the best life cycle that we are able to develop, without limiting the entire ecosystem.

Thanks for reading, see you next time

Hack your life!


Today i want to encorage you to hack your life.
I’m not taliking about things like “open the wine with a CD”, I’m taliking about real hacks.

Hack something is about view this thing in a new way, in the way is never supposed to be viewed.

When you think to an hacker the first picture is somebody with a computer (or a smartphone) and most of the time is quite correct.

But just think what this person is doing: mainly he is tring to use a software in a strage way to get something new and I think evryone must do this with their life.
I’m tring to do so, tring to change my bad behaviour with something usefull to me.

This hacking is not easy, the real world is not like a software (for this at least), you can’t reset to a checkpoint, so hacking your life is quite dangerous.
But, sometime you have to try.

When you hack the real word you could find something funny, for example a “bug” in the common sense and this “bug” can be used to get to your goal.

Like in all scientific researches this hasn’t a clear usefull return on investment and this is the point.

The only way to find out what you will find is find it.

So…hack your life!

SSH and complex configs


Today I want to talk about the .ssh/config file, for who don’t knows about it is the configuration file for SSH to customize options to connect with SSH.

The issue with this file is: it don’t supports some kind of “include”, this can be an issue if you have to write long config file.

I wrote a bit of shell script to workaround this (you can see the script here

This script creates the .ssh/config reading slice of config from .ssh/config.d/ in order and recursively.

I hope to be helpfull for someone.

Come gli algoritmi dei social network distruggono la nostra percezione della realtà

Buonasera, oggi volevo parlarvi di una cosa molto importante della quale, forse, non tutti sono pienamente coscienti.

Un sito come facebook o twitter raccolgono ogni secondo gigabyte di informazioni da un numero sterminato di fonti, pensare che tutti ricevano le informazioni degli amici e delle pagine seguite è folle.

Per evitare questo tipo di “bombardamento” in questi siti viene implementato un meccanismo che permette di “selezionare” solo i posto “affini” alla persona.

Questo crea un ambiente amichevole, un posto nel quale ci piace stare e questo è in accordo con gli obbiettivi di una piattaforma che trae profitto dalle pubblicità.

Il problema esiste nel momento in cui l’ambiente è amichevole a tal punto che in sostanza vediamo unicamente ciò con cui siamo in perfetto accordo.

Questa visione filtrata della realtà crea un’illusione che chiunque sia d’accordo con noi, che ciò che pensiamo sia il pensiero comune, questo può portare a rafforzare credenze sbagliate o consolidare idee folli.

È giunto quindi il momento di sforzarsi a cercare ciò che non ci piace, a crearci il nostro contraddittorio per evitare di perdere completamente ed inesorabilmente il contatto con la realtà.

Buona serata, alla prossima

“Once you stop learning you start dying”


The quote “once you stop learning you start dying” is from Albert Einstein and I’d love to explain why he was right.

In the first place, Since I was 10 years old I started my journey in to the IT world as I was learning new things, I discovered new possibilities to continue to learn. Today, after 10 years, the situation has not changed in any way .

The big problem about keep learning is about finding a mentor to help you with what you want to learn or a reliable source of content.

Talking about the distributed architecture of the network, it is not very hard to find good materials with some kind of peer reviews for a lot of subjects.

However, there is an hidden truth about the net: somewhere in the world we need a server in a datacenter.

This is why in WikiToLearn we are trying to involve many people such as students, teachers and researchers.

I believe that this is the reason why we can offer something useful, it is a merge of two worlds and this can be extremely powerful to spread the knowledge in its highest forms.

I hope to keep learning forever, because I know that out there, there are stuff that i cannot even image today and, sadly, maybe neither tomorrow.