Monitoring your cluster in just a few minutes using ISA

CPU Sum.

An example after visualizing the data produced by ISA.

Suppose you have a cluster. Suppose you would like to monitor your cluster as soon as possible without installing all kind of tools on the cluster. A new software package named ISA has been created which can do centralized monitoring for you! This article is a walkthrough for ISA and helps you setting up monitoring for your cluster in just a few minutes.

Features

  • ISA can collect many node statistics such as CPU usage, memory usage and disk I/O.
  • It is easy to setup and it has flexible node configuration.
  • ISA ensures minimal influence for the node statistics.
  • No setup required on the nodes, the statistic management is done centrally.

In this tutorial, we will setup ISA and collect cluster statistics in a CSV.

Why ISA?

ISA is easy to setup and is very useful for monitoring a cluster for a short amount of time and gather as much as information as possible. Below, some example visualizations of the CSV file produced by ISA are shown.

Visualizing incoming network traffic.

Visualizing incoming network traffic from 13 different nodes.

The memory usage of the entire cluster (consisting of 13 nodes) over time.

The memory usage of the entire cluster (consisting of 13 nodes) over time.

Disk usage of all the nodes of the cluster.

Disk I/O of all the nodes of the cluster.

Installing ISA

For installing ISA, you just need Python and Pip. ISA can be installed using the following command:

pip install isa

Et voila! You have installed ISA. That was only half a minute of work.

Configuring your cluster

ISA needs to know where your servers are and how ISA can connect to these servers. No worries, this will all happen on your local machine and it is easy to configure. It is even possible to specify connections via another machine (for example, if you need to connect via a proxy or via a login node).

Suppose you have two machines, server-a and server-b and you can login on server-a with username “user-a” and password “password-a” and on server-b you can login with username “user-b” and password “password-b”. Furthermore, server-a is located at IP address 127.0.0.1 and server-b is located at www.mydomain.com. You need to create one INI file specifying all of this. In my example, I have created “nodes.ini” with the following content:

[server-a]
host = 127.0.0.1
username = user-a
password = password-a

[server-b]
host = www.mydomain.com
username = user-b
password = password-b
nodes.ini

Gathering resources

Now we have the nodes.ini file, we are ready to use ISA for collecting statistics. If you want to check if everything works fine, you can use the standard output mode which displays the statistics in your terminal. This can be done as follows:

isa nodes.ini -vv

That was very easy! Now, if you want to write the results to a CSV file, you can do the following. Make sure that the output file ends on .csv:

isa nodes.ini --out results.csv -vv

The -vv flag is optional, but provides additional information about the run. I got the following results in results.csv:

collectl::Buff,collectl::Cach,collectl::Comm,collectl::Cpu0,collectl::Cpu1,collectl::Cpu2,collectl::Cpu3,collectl::Frag,collectl::Fragments,collectl::Free,collectl::Handle,collectl::IP,collectl::Icmp,collectl::Inac,collectl::Inodes,collectl::KBIn,collectl::KBOut,collectl::KBRead,collectl::KBWrit,collectl::Map,collectl::Meta,collectl::PktIn,collectl::PktOut,collectl::Raw,collectl::Reads,collectl::Slab,collectl::Tcp,collectl::Udp,collectl::Writes,collectl::cpu,collectl::ctxsw,collectl::inter,collectl::sys,node,plugin,time
23M,1G,0,679,927,666,617,0,tssslja2213,267M,8224,0,2,646M,116200,14,8,0,34,4G,0,76,62,0,0,464M,470,0,0,14,11123,2889,2,server-a,collectl,2016-07-18 19:12:24.247616
23M,1G,0,929,779,669,565,0,tssslja2213,262M,8224,0,2,651M,116296,11,9,1,54,4G,0,75,61,0,0,464M,466,0,0,16,10992,2942,3,server-a,collectl,2016-07-18 19:12:35.246169
23M,1G,0,795,1018,767,849,0,tssslja2213,259M,8192,0,2,651M,116325,14,8,0,125,4G,0,75,58,0,0,464M,468,0,0,14,11647,3430,2,server-a,collectl,2016-07-18 19:12:46.246440
23M,1G,0,911,945,772,531,0,tssslja2213,260M,8224,0,5,652M,116321,9,7,0,2,4G,0,67,56,0,0,464M,463,0,0,13,11216,3161,2,server-a,collectl,2016-07-18 19:12:57.245966

Since I use only one plugin (the default plugin called collectl), I can see that the plugin field in the CSV is everywhere set to this plugin. I can also see statistics like the Cpu1 flag of collectl. For more information of the fields, one can look up the documentation of collectl. Note that this CSV can be used by any other type of application.

More flags

It is possible to specify how often statistics are collected. This is done by setting the “interval” flag which specifies the number of seconds ISA will collect statistics. For example, if you would like to gather statistics from every 10 seconds, you can do the following:

isa nodes.ini --interval 10 -vv

It is also possible to specify how long ISA will try to connect to your server using the “timeout” flag (also specified in seconds):

isa nodes.ini --timeout 5 -vv

Additional resources

ISA can be found on GitHub. Its documentation can be found on ReadTheDocs which explains all the details.

Conclusion (TL;DR)

If you have a short timespan and you want to measure statistics of a cluster, definitely use ISA. It is easy to setup and easy to configure and results are obtained within a few seconds.

Kevin Jacobs

Kevin Jacobs

Kevin Jacobs is a certified Data Scientist and blog writer for Data Blogger. He is passionate about any project that involves large amounts of data and statistical data analysis. Kevin can be reached using Twitter (@kmjjacobs), LinkedIn or via e-mail: mail@kevinjacobs.nl.