Hackathon project, tracking fake news in social networks

On April 3-6 2020 I took part in the Hackathon LauzHack against COVID-19 organized by the LauzHack association and EPFL. It was an interesting experience and I would like to summarize here the results we got during the 3 days on our project, as well as my impressions.

Les démarches pour la cloture de l'exercice comptable

Chaque année, les entreprises doivent faire un bilan et communiquer des informations clefs aux diverses administrations. Pour la première année d’exercice d’Evia Cybernetics, j’ai décidé de faire cela moins même. Le but était à la fois de réduire les dépenses et d’apprendre un peu les bases de la comptabilité et de l’administration d’une entreprise. J’ai appris beaucoup de choses. Je relate ici mon expérience sur les différentes étapes pour clore un exercice comptable.

En route pour le Bac

For my english readers, this post is in French. It is talking about student data from the French education system so I thought it would be better to write it in French. Je suis récemment retourné au Lycée. Non pas comme élève mais comme professeur. Pendant 2 mois j’ai enseigné les mathématiques (ou du moins j’ai essayé!) et en regardant les notes des différents contrôles, je dois dire que j’étais un peu perplexe. En effet, la distribution des notes autour de la moyenne est singulière. Elle semble avoir parfois une forme gaussienne, plus ou moins étalée et d’autres fois une forme plus complexe avec, semble-t-il, une séparation en 2 groupes (2 Gaussiennes). Comme j’adore l’analyse de données et que j’ai tous les outils à ma disposition avec Python, j’ai plongé dans l’analyse des notes. J’ai utilisé non seulement des statistiques que je leur ai enseignés (au programme du Bac), mais c’était aussi l’occasion d’utiliser mes algorithmes favoris d’apprentissage automatique et intelligence artificielle, ici le modèle de mélange gaussien. Les résultats et l’analyse en elle-même sont intéressants et c’est pour ca que je les partage. Voilà ce que j’ai découvert…

Exploring the oceans, around Antartica

The Antartic Curcumnavigation Expedition is a scientific expedition that has collected a large amount of data from Antartica. Scientists working on the project would be happy to get some help from experts in data analysis and data scientists. They plan to make the data open and during the first Data Jam at EPFL in Lausanne, some of their data were presented. The goal and the dataset convinced me to join them in the analysis of the sonar data. This post is a short presentation of what we managed to do during the 2 Data Jam Days. It reveals some interesting information about Antartica and on the methods used to analyze sonar signals.

Testing the Cosmos DB graph database

Graph databases raise more and more interest as alternatives to standard SQL databases. Indeed, the graph structure may be better suited when queries are focusing on relationships between entities stored in the database. Cloud companies have spotted this trend and provide new solutions to set up graph databases in the cloud. Amazon and Google have made the choice of providing ways to connect JanusGraph to DynamoDB and BigTable respectively. Microsoft has chosen to provide its own graph database while relying on Gremlin for handling queries. Azure is using Cosmos DB as a backend for it. I have tested the Cosmos graph DB and I want here to share my first impressions.

Setting up JanusGraph on AWS using EC2 and DynamoDB

Graph Databases are not yet widely used and it is still not completely straightforward to run one in the cloud. I describe here the different steps I made to install and run the JanusGraph database, using the NoSQL database DynamoDB as a storage backend. The cloud is the one from Amazon (AWS).

A simple explanation of entropy in decision trees

I recently wanted to refresh my memory about Machine Learning methods. I have spent some time reading tutorials, blogs and Wikipedia. No doubts, the Internet is really useful, this is great. I got back on tracks quickly with the general idea of the mainstream algorithms. However, from time to time, I find some particular points and explanations obscure. This concerns often details about the algorithms which are overlooked. Sometimes, the emphasis is on the main part of the algorithm and some details are left missing. But I found these missing parts quite important to fully understand what’s going on in the algorithm.

How graphs can help you find information in data

If you think graphs and networks techniques are only useful for graph datasets or data structured as a network of entities, you are wrong. You can leverage powerful graph methods by designing a graph from unstructured data. In this example, I show how to extract information from a set of texts (emails). At the end of this post, you will discover what H. Clinton was talking about in her emails, on which topics she was exchanging information and who/what was involved in each topic.