Pandas logo.

Python Pandas Tutorial: The Basics

In this Python Pandas tutorial, you will learn the basics of Pandas by code examples written in Python. If you have zero knowledge of Python, please read this article first.

Pandas Tutorial: What is Pandas?

Pandas is an open source library for Python containing data structures and data analysis tools.

Constructing a DataFrame

The DataFrame is the main data structure used in Pandas. Personally, I think of it as a database on which I can execute queries. A DataFrame is constructed as follows:

import pandas as pd

my_data = [
    ['Kevin', 26, 'https://www.data-blogger.com/'],
    ['Sundar Pichai', 45, 'https://www.google.com/'],
    ['Mark Zuckerberg', 33, 'https://www.facebook.com/']
]

df = pd.DataFrame(my_data, columns=['Name', 'Age', 'URL'])
print(df)

There are a few things worth mentioning. Often, Pandas is abbreviated as pd (like Numpy which is often abbreviated as np). If you look at other code, you will see that DataFrames are often abbreviated by df. Here, the DataFrame is constructed using data from a list of lists. The columns argument specifies the keys of the data. The result is the following DataFrame:

A DataFrame.

A DataFrame.

Instead of construction by lists, we could have initialized the DataFrame by using a dictionary. This is more convenient for most people:

import pandas as pd

my_data = [
    {    
        'Name': 'Kevin',
        'Age': 26,
        'URL': 'https://www.data-blogger.com/'
    },
    {    
        'Name': 'Sundar Pichai',
        'Age': 45,
        'URL': 'https://www.google.com/'
    },
    {    
        'Name': 'Mark Zuckerberg',
        'Age': 33,
        'URL': 'https://www.facebook.com/'
    }
]

df = pd.DataFrame(my_data)
print(df)

However, it becomes a tedious task to keep track of all the keys. The convenience here is that you do not need to specify the column names during the DataFrame construction.

Query the Data

It is easy to select data from a single column:

print(df['Age'])

Selecting multiple columns is also easy:

print(df[['Name', 'Age']])

Selecting some rows is also not hard:

# Select the first row of the Age column
print(df['Age'].iloc[0])

# Select the first two rows
print(df.iloc[:2])

Plot The Data

It is fairly easy to make plots of the data using the DataFrame object. For example, it is straightforward to make a histogram out of the ages:

Histogram.

Histogram.

This is created by the following code (which is a piece of a cookie):

df['Age'].hist()

Conclusions (TL;DR)

Pandas is an absolute must-have library for any data science related project. In this tutorial, only the very basic steps of Pandas are covered. A more detailed tutorial is planned for the future! If you have any questions or comments, feel free to leave them below.

Kevin Jacobs

Kevin Jacobs

Kevin Jacobs is a certified Data Scientist and blog writer for Data Blogger. He is passionate about any project that involves large amounts of data and statistical data analysis. Kevin can be reached using Twitter (@kmjjacobs), LinkedIn or via e-mail: mail@kevinjacobs.nl.