People that are just learning about Bitcoin often think that all transactions are anonymous, insecure and untraceable. After all, cryptocurrency transactions are barely regulated and are conducted over the Internet. However, the anonymous founders of Bitcoin developed a ledger system to track all Bitcoin transactions. This system is known as the blockchain.Read more · 11 minutes
In this course, you will learn how to use the Python Pandas. After the course, you will be able to:
- Load and transform your data
- Visualizing data using line plots, scatter plots and histograms
- Merging and storing data
The course also includes more advanced topics, such as data parallelization and aggregation.(more…) Read more
Disclaimer: The results are valid only in the case when network attached storage is used in the computing cluster.
The amount of data is growing significantly over the past few years. It is not feasible for only one machine to process large amounts of data. Therefore, the need of distributed data processing frameworks is growing. It all started back in 2011 when the first version of Apache Hadoop was released (version 1.0.0). The Hadoop framework is capable of storing a large amount of data on a cluster. This is known as the Hadoop FileSystem (HDFS) and it is used at almost every company which has the burden to store Terabytes of data every day. Then the next problem arose: how can companies process all the stored data? Here is where Distributed Data Processing frameworks come into play. In 2014, Apache Spark was released and it now has a large community. Almost every IT section has implemented at least some lines of Apache Spark code. Companies gathered more and more data and the demand for faster data processing frameworks is growing. Apache Flink (released in March 2016) is a new face in the field of distributed data processing and is one answer to the demand for faster data processing frameworks.Read more · 13 minutes