WikiToLearn migration, why?

Well, currently WikiToLearn runs on MediaWiki, which is a good model for dealing with an encyclopedia but, when you are trying to build a more structured content, it doesn’t fit.

For the release 1.0 we have developed CourseEditor, which tries to make the unstructured content more structured, for example offering a drag-and-drop UI to manage a course structure.

However, this isn’t enough: the issue with the MediaWiki data structure is that the versioning is only at the single page level, which is a good design for something that has little to none requirement of unambiguous references between pages, but it is a very big deal if you are talking about a course.

This is the main reason why we had to think about something new.

Because we need something new we have the opportunity to build something that make easier to add features, functions and capabilities, try new things without too much worry about the whole stack.

Thanks to a big support from GARR, we have access to a lot of computing power, not only in terms of raw CPU/RAM/Storage/Network but also in terms of number of servers (or VMs) and this means that we can build something distributed, and therefore having full tolerance and resilience.

The first thing that comes to mind in this scenario is: microservices! Microservices everywhere!

With containers, containers all around the place!

Yes, microservices, not in the “put a LAMP stack in docker” way of doing it, but with a proper stratified design.

MediaWiki is not designed with micro-services in mind and we have already mediawiki inside docker, we need to design your platform from scratch.

The first step for this ambitious project is the storage: we have to store the user data in a safe way, and this service is the critical one, the one that we can’t do wrong – there isn’t a second chance for doing it.

Now we are testing using Eve to store everything in an object storage with RESTful API, the backend at this time is MongoDB (with replication).

Now the very big issue is: how do we transform the MediaWiki data-structure in something with rigid internal references and course-wide versioning?

For this we used mongodb as temporary storage to work on the data and process every page to find and resolve every reference.

Now the migration is working quite well, it’s not done yet but we are confident that we can do the magic trick very soon.

Bye!

MongoDB for WikiToLearn migration

Hi!

Today i want to talk about my experience with the WikiToLearn migration.

The problem of every migration is getting your hands on the data in a way such that you can work on it.

Starting from the mysql backend and trying to have everything into a versioned object storage (python eve is the one we are tring now) is not an option.

The solution is to use a temporary database to keep the data, process the data in this temporary storage and afterwards uploading everything in the destination.

After some tries we managed to have the pipeline that reads all the MediaWiki pages, parses the structure and uploads everything in eve, using mongodb as a temporary storage.

But why mongo?

There are tons of databases and, after all, why use a dbms as temporary storage?

Well, the first thing is that mongodb is an implicit-schema dbms (there isn’t such thing as schema-less) and this is useful because you can add and remove fields at will without a full restructuration of the data-scheme and this can speed up quick hacks to test.

The point I want to make is that a DBMS is fast, you can try to build an in-memory representation of the data, I’ve tried, but it’s quite hard and quite slow or it’s another DBMS, so why not use an existing DBMS?

The next point is about persistence: when you have to work with a non trivial dataset, it is quite nice to be able to re-run only a part of the migration, as this speeds up the development.

Mongodb has also mongdump and mongorestore, which lets you snapshot everything and restore from a “checkpoint”.

I hope I’ve given you some good points to think about the next time you have to migrate from an old datastore to a new one.

Bye!

Dumbing down every IT solution is dumb

We are seeing an everyday increasing process to make everything easy in the IT world.

Every “make your own website”, every “make high performance application with only a yaml” smells bad to me.

When you are making an IT product where lots of things are involved and you think, sometime, that putting a docker swarm in your application stack could solve every problem, maybe this is not the right way to face all the problems.

I think that we, as IT world builders, have to stop saying “it is easy”, because most of the time is not.

We have to start embracing the complexity of our product and stop trying to over-simplify everything on the shoulders of our users.

One example of this trend to oversimplify is when a webmaster does not implements the best practice about password storage and the best excuse is: this is only my website, isn’t an high profile target.

Yes, I know, your website has less than 50 users and so on and so forth.

This isn’t a valid reason to avoid implementing a valid software for storing users passwords because users are lazy and they mostly use the same password everywhere.

I can hear you complain with: “you have to use a strong password and don’t reuse the password” and so on …

This is a best practice, but a user is not a security expert. You have to improve and implement better solutions, is not a valid excuse that users are wrong to use the same passwords for the bank account or for a small website that hosts recipes. It’s your duty to protect the user.

This is the real world, you can not ignore these issues and pretend that the world is build on top of best practice and security experts.

Thanks guys, see you next time.

PKI is needed for micro-services

Hello!

Today I want to explain why I think that for a proper micro-service software a PKI is needed.

First the problem: when you have a network application you need a way to authenticate one service to another and verify a service to another.

One way to do this is with usernames and passwords or tokens. This solution works well but there is an issue about where to store the secret data, how to deploy the secret data to all nodes in a secure way and how to revoke access to only one node.

When you are using only usernames/passwords or tokens, it is kind of a mess and you have to write everything to a config file. Revocation is not easy and needs good orchestration to avoid downtime.

PKI is a strong and standard way to have mutual authentication between two endpoints.

Managing a CA is not an easy task but the effort pays off if you care about security and you want to avoid a big spaghetti-style security approach.

Someone would say: but we can trust the source IP!
The short answer to this is: no.

The long answer is: no! no! no! no! no! no! no! no! no!

An IP address is not secure by design, the network can be manipulated quite easily with an L2 access (like one server compromised).

Also, the IP layer is not encrypted by default, so if you have to use some kind of encryption on top in your application, what’s the point of encrypting everything with a pre shared key when you can use an asymmetric layout?

I hope I’ve made my point and that you will use PKI for your next micro-service application.

Bye!

The magic of ~/.ssh/config

Hi, today I want to talk about the ~/.ssh/config file.

First thing about this magic file: if you are using ssh you must have this file, this is a fact.

For example, i use git with ssh, because ssh is a very good protocol, and to use git over ssh we don’t use the TTY, so we can put in the config file something like:

Host git.kde.org
 User git
 RequestTTY no

In this way I can execute

ssh git.kde.org

without seeing the annoing “PTY allocation request failed” message.

Another sorcery happens when you have multiple ssh keys, I have one key for each “scope”, for example: one for KDE, one for GitHub, one for GitLab, one for my home, ecc.

I don’t want to use the “-i” option each time to select the right key, this is why I use the IdentityFile option, for example applyed to “*.kde.org”.

Sometime I have to connect to a server without direct access to the sshd daemon with a direct TCP connection, in this scenario i use the “ProxyCommand” option, this is a command to execute to proxy the ssh connection via another host, for example “ProxyCommand ssh bastion.<domain> nc %h %p”.

The last useful thing is that you can create an alias for an host, for example I have my server with FQDN “srv.domain.tdl” listening on port 1900 TCP, I can create an alias, like “srv” using something as:

Host srv
 HostName srv.domain.tdl
 Port 1900

With this config I can run “ssh srv” and be on my server.

Thanks for reading.

Micro-services are only half the picture

Hello,

today I want to expose my my thoughts about the general hype for micro-services.

The first objection that one can move against this approach is that it does not really solve the problem of having the maintainable code because the same principles can be found in a lot of other paradigms that did not prevent bad software to be produced.

I believe that the turning point of the micro-services stuff is that is compatible with the devops philosophy.

With the combination of micro-services and devops you get software that has some reasonably well-defined limits and whose management is assigned to the people who developed the software.

This combination avoids of development shortcuts that make management more difficult (maintenance is a big deal).

This thing also solve one of the great IT open problems: the documentation.

It is true that it can not force us to produce documentation, but, at least, who run the code is exactly who produced it and i can guess that who writes the code knows how the code has to work.

It is now possible to build applications with high performance and functionality unimmaginable before, all this thanks to the fact that each component can be realized, evolved and delployed with the best life cycle that we are able to develop, without limiting the entire ecosystem.

Thanks for reading, see you next time

Hack your life!

Hi!

Today i want to encorage you to hack your life.
I’m not taliking about things like “open the wine with a CD”, I’m taliking about real hacks.

Hack something is about view this thing in a new way, in the way is never supposed to be viewed.

When you think to an hacker the first picture is somebody with a computer (or a smartphone) and most of the time is quite correct.

But just think what this person is doing: mainly he is tring to use a software in a strage way to get something new and I think evryone must do this with their life.
I’m tring to do so, tring to change my bad behaviour with something usefull to me.

This hacking is not easy, the real world is not like a software (for this at least), you can’t reset to a checkpoint, so hacking your life is quite dangerous.
But, sometime you have to try.

When you hack the real word you could find something funny, for example a “bug” in the common sense and this “bug” can be used to get to your goal.

Like in all scientific researches this hasn’t a clear usefull return on investment and this is the point.

The only way to find out what you will find is find it.

So…hack your life!

SSH and complex configs

Hi!

Today I want to talk about the .ssh/config file, for who don’t knows about it is the configuration file for SSH to customize options to connect with SSH.

The issue with this file is: it don’t supports some kind of “include”, this can be an issue if you have to write long config file.

I wrote a bit of shell script to workaround this (you can see the script here https://quickgit.kde.org/?p=scratch%2Ftomaluca%2Fssh-build-config.git).

This script creates the .ssh/config reading slice of config from .ssh/config.d/ in order and recursively.

I hope to be helpfull for someone.

Ansible automation tool

Hi!

This days I’m working to improve my skils about prepare, test and deploy complex IT systems like mail servers or database cluster.

To acomplish this I started using ansible to speed up the operation.

With ansible is quite easy setup a configuration template and the procedure to bring up the new service or re-configure an existing one.

Unlike other automation tools like puppet it don’t require any kind of specialized server, it uses ssh to accesso to all servers and this can be a good solution also to firewall/network ACL issues.

I’m thinking about migrate all my sh script to ansible structure but first I have to make some test.

Bye!

I’m in the GARR Workshop 2016

Hi everyone! Today I’m at the GARR Workshop 2016, happening within the CNR headquarters in Rome, and I just presented to the audience how in WikiToLearn we work to develop our project in all tech stuff.

GARR WorkShop 2016

I was invited to deliver a talk about WikiToLearnHome, our dev ops infrastructure and automation system.
Tomorrow Riccardo will deliver a talk within the plenary track to introduce WikiToLearn to the 300+ university representatives who came to learn about innovation in digital education.

http://www.garr.tv/home/viewvideo/1012/gdl-cloud-a-storage-sviluppare-wikitolearn-dal-laptop-al-datacenter-ltoma-workshop-garr-2016-roma