Apache Flink: The Next Distributed Data Processing Revolution?

Disclaimer: The results are valid only in the case when network attached storage is used in the computing cluster.

The logo of Apache Flink.
The logo of Apache Flink.

The amount of data is growing significantly over the past few years. It is not feasible for only one machine to process large amounts of data. Therefore, the need of distributed data processing frameworks is growing. It all started back in 2011 when the first version of Apache Hadoop was released (version 1.0.0). The Hadoop framework is capable of storing a large amount of data on a cluster. This is known as the Hadoop FileSystem (HDFS) and it is used at almost every company which has the burden to store Terabytes of data every day. Then the next problem arose: how can companies process all the stored data? Here is where Distributed Data Processing frameworks come into play. In 2014, Apache Spark was released and it now has a large community. Almost every IT section has implemented at least some lines of Apache Spark code. Companies gathered more and more data and the demand for faster data processing frameworks is growing. Apache Flink (released in March 2016) is a new face in the field of distributed data processing and is one answer to the demand for faster data processing frameworks.

(more…)

Read More

Monitoring your cluster in just a few minutes using ISA

CPU Sum.
An example after visualizing the data produced by ISA.

Suppose you have a cluster. Suppose you would like to monitor your cluster as soon as possible without installing all kind of tools on the cluster. A new software package named ISA has been created which can do centralized monitoring for you! This article is a walkthrough for ISA and helps you setting up monitoring for your cluster in just a few minutes.

Features

  • ISA can collect many node statistics such as CPU usage, memory usage and disk I/O.
  • It is easy to setup and it has flexible node configuration.
  • ISA ensures minimal influence for the node statistics.
  • No setup required on the nodes, the statistic management is done centrally.

In this tutorial, we will setup ISA and collect cluster statistics in a CSV.

(more…)

Read More