Tweets

Gathering Tweets with Python

This tutorial guides you in setting up a system for collecting Tweets. Not in Apache Spark or Apache Flink, but just in Python. In many use cases, just a single computing node can collect enough Tweets to draw decent conclusions. In future blog posts, I will explain how to collect Tweets using a cluster (and with either Apache Spark or Apache Flink). But for now, lets focus on a simple Pythonic harvester!

(more…)

Read More

Stock price following an upward trend.

Getting Rich using Bitcoin stockprices and Twitter!

How can we use machine learning to predict stockprices? In this tutorial we will make Python scripts for doing sentiment analysis on Tweets and it is explained how to use it for making predictions.

As an example, suppose we had €1000,- at the first of January of 2014 and suppose we could use the algorithm which is described in this tutorial. Then it would generate €2901,- in total on the 22th of February, 2017! The total amount of money (cash + investments) is shown in the next figure:

Money.
Money.

Despite the patience you need to have, it will be worth the waiting time eventually. As mentioned in [1], moods in tweets are a good indication of the movement of closing prices on a stock market. In this article, we will only predict how positive or how negative a tweet is. But it turns out that this is giving predictive signals which is accurate enough for our purposes.

(more…)

Read More

Overview scientific Python packages

Overview

There are lots of Python packages available on the internet. The aim of this post is to give you an overview of scientifically oriented Python packages, sorted per topic. The list will be updated regularly. If you have any recommendations, feel free to give your addition in the comments!

Mathematics

  • NumPy – Powerful computational framework.
  • pandas – Data structures and data analysis.
  • matplotlib – Plotting and visualization tools.
  • SymPy – For working with symbolic mathematics.
  • Numba – High performance mathematical toolkit.

Sampling

  • emcee – Monte Chain Monte Carlo sampling.

Machine Learning

Text Mining

  • Scikit learn – Machine Learning toolkit.
  • NLTK – Working with human language data.

Image processing

Speed improvements

  • Nuitka – Python compiler.
  • Cython – C-extensions for Python.
  • PyPy – Improves speed and memory.

Editors

Hyperframeworks

  • Anaconda – A bundle of the most used Python libraries.
  • SciPy – A bundle of the most used scientific Python libraries.

Read More