Data Blogger Courses

Mastering Pandas

In this course, you will learn how to use the Python Pandas. After the course, you will be able to:

  • Load and transform your data
  • Visualizing data using line plots, scatter plots and histograms
  • Merging and storing data

The course also includes more advanced topics, such as data parallelization and aggregation.

You can see all course content under “Curriculum” on Data Blogger Courses and the first three lessons are free. The first free lesson can be found here.

(more…) Read more

How to scrape a website using Python + Scrapy in 5 simple steps

In this Python Scrapy tutorial, you will learn how to write a simple webscraper in Python using the Scrapy framework. The Data Blogger website will be used as an example in this article.

Scrapy: An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

By the way, if you are interested in scraping Tweets, you should definitely read this article.

(more…)

Read more · 14 minutes
CPU Sum.

Monitoring your cluster in just a few minutes using ISA

Suppose you have a cluster. Suppose you would like to monitor your cluster as soon as possible without installing all kind of tools on the cluster. A new software package named ISA has been created which can do centralized monitoring for you! This article is a walkthrough for ISA and helps you setting up monitoring for your cluster in just a few minutes.

Features

  • ISA can collect many node statistics such as CPU usage, memory usage and disk I/O.
  • It is easy to setup and it has flexible node configuration.
  • ISA ensures minimal influence for the node statistics.
  • No setup required on the nodes, the statistic management is done centrally.

In this tutorial, we will setup ISA and collect cluster statistics in a CSV.

(more…)

Read more · 12 minutes