This tutorial guides you in setting up a system for collecting Tweets. Not in Apache Spark or Apache Flink, but just in Python + Tweepy. In many use cases, just a single computing node can collect enough Tweets to draw decent conclusions. In future blog posts, I will explain how to collect Tweets using a cluster (and with either Apache Spark or Apache Flink). But for now, lets focus on a simple Pythonic harvester! If you are interested in scraping a website, you should definitely read this article.
The series “Data Mining with Python on Medical Datasets for Data Mining” is a series in which several data mining techniques are highlighted. The series are written in collaboration with John Snow Labs which provided me the medical datasets. In this article basic Text Mining techniques will be highlighted and some of the results are presented.
By the way, if you are interested in Deep Learning you should definitely read this article on implementing a GRU in Python using Tensorflow.