This tutorial guides you in setting up a system for collecting Tweets. Not in Apache Spark or Apache Flink, but just in Python + Tweepy. In many use cases, just a single computing node can collect enough Tweets to draw decent conclusions. In future blog posts, I will explain how to collect Tweets using a cluster (and with either Apache Spark or Apache Flink). But for now, lets focus on a simple Pythonic harvester! If you are interested in scraping a website, you should definitely read this article.
How can we use machine learning to predict stockprices? In this tutorial we will make Python scripts for doing sentiment analysis on Tweets and it is explained how to use it for making predictions.
As an example, suppose we had €1000,- at the first of January of 2014 and suppose we could use the algorithm which is described in this tutorial. Then it would generate €2901,- in total on the 22th of February, 2017! The total amount of money (cash + investments) is shown in the next figure:
Despite the patience you need to have, it will be worth the waiting time eventually. As mentioned in , moods in tweets are a good indication of the movement of closing prices on a stock market. In this article, we will only predict how positive or how negative a tweet is. But it turns out that this is giving predictive signals which is accurate enough for our purposes.