Blockchain.

What Is The Blockchain And Why Should You Care?

People that are just learning about Bitcoin often think that all transactions are anonymous, insecure and untraceable. After all, cryptocurrency transactions are barely regulated and are conducted over the Internet. However, the anonymous founders of Bitcoin developed a ledger system to track all Bitcoin transactions. This system is known as the blockchain.

Also interesting: this post which explains a new security threat based on blockchain. And this post describes a Python based cryptocoin history scraper.

(more…)

Read more · 11 minutes
Data Blogger Courses

Mastering Pandas

In this course, you will learn how to use the Python Pandas. After the course, you will be able to:

  • Load and transform your data
  • Visualizing data using line plots, scatter plots and histograms
  • Merging and storing data

The course also includes more advanced topics, such as data parallelization and aggregation.

You can see all course content under “Curriculum” on Data Blogger Courses and the first three lessons are free. The first free lesson can be found here.

(more…) Read more

Apache Flink: The Next Distributed Data Processing Revolution?

Disclaimer: The results are valid only in the case when network attached storage is used in the computing cluster.

The logo of Apache Flink.

The logo of Apache Flink.

The amount of data is growing significantly over the past few years. It is not feasible for only one machine to process large amounts of data. Therefore, the need of distributed data processing frameworks is growing. It all started back in 2011 when the first version of Apache Hadoop was released (version 1.0.0). The Hadoop framework is capable of storing a large amount of data on a cluster. This is known as the Hadoop FileSystem (HDFS) and it is used at almost every company which has the burden to store Terabytes of data every day. Then the next problem arose: how can companies process all the stored data? Here is where Distributed Data Processing frameworks come into play. In 2014, Apache Spark was released and it now has a large community. Almost every IT section has implemented at least some lines of Apache Spark code. Companies gathered more and more data and the demand for faster data processing frameworks is growing. Apache Flink (released in March 2016) is a new face in the field of distributed data processing and is one answer to the demand for faster data processing frameworks.

(more…)

Read more · 13 minutes