The tools

Cloud storage

Both datasets were stored in a computer cluster (IC Cluster), hosted in EPFL.

The datasets were huge (in terms of terabytes for Tweets Leon and in terms of gigabytes for Swiss Tweets) and they were were stored in a Hadoop file system (HDFS).

In order to conduct the analysis we wrote some Python scripts and ran them via Spark on the cluster.

The analysis and plots

Pandas and jupyter notebooks were indispensable tools while conducting the analysis of this project and made it easier to try things out and to share/showoff all of our intermediate results.

The plots found in this site were made possible via plot.ly and its Python API.

This site

This site was created with Jekyll, using the beautiful-jekyll theme and is hosted via GitHub pages.

Source code

The source code of this project can be found on GitHub.