//
Pandas logo.

Python Pandas Tutorial: The Basics

In this Python Pandas tutorial, you will learn the basics of Pandas by code examples written in Python. If you have zero knowledge of Python, please read this article first.

Pandas Tutorial: What is Pandas?

Pandas is an open source library for Python containing data structures and data analysis tools.

Constructing a DataFrame

The DataFrame is the main data structure used in Pandas. Personally, I think of it as a database on which I can execute queries. A DataFrame is constructed as follows:


Are you interested in learning more Python? Order our new "Mastering Pandas" course now on Data Blogger Courses for only €49 €39.00! You will:
  • Learn to visualize data using Pandas
  • Learn how to load and store data effectively
  • Learn advanced data operations
For the price of four pizzas, you will join our Data Blogger community, so don't miss out and make sure to get your copy now.

import pandas as pd

my_data = [
    ['Kevin', 26, 'https://www.data-blogger.com/'],
    ['Sundar Pichai', 45, 'https://www.google.com/'],
    ['Mark Zuckerberg', 33, 'https://www.facebook.com/']
]

df = pd.DataFrame(my_data, columns=['Name', 'Age', 'URL'])
print(df)

There are a few things worth mentioning. Often, Pandas is abbreviated as pd (like Numpy which is often abbreviated as np). If you look at other code, you will see that DataFrames are often abbreviated by df. Here, the DataFrame is constructed using data from a list of lists. The columns argument specifies the keys of the data. The result is the following DataFrame:

A DataFrame.

A DataFrame.

Instead of construction by lists, we could have initialized the DataFrame by using a dictionary. This is more convenient for most people:

import pandas as pd

my_data = [
    {    
        'Name': 'Kevin',
        'Age': 26,
        'URL': 'https://www.data-blogger.com/'
    },
    {    
        'Name': 'Sundar Pichai',
        'Age': 45,
        'URL': 'https://www.google.com/'
    },
    {    
        'Name': 'Mark Zuckerberg',
        'Age': 33,
        'URL': 'https://www.facebook.com/'
    }
]

df = pd.DataFrame(my_data)
print(df)

However, it becomes a tedious task to keep track of all the keys. The convenience here is that you do not need to specify the column names during the DataFrame construction.

Query the Data

It is easy to select data from a single column:

print(df['Age'])

Selecting multiple columns is also easy:

print(df[['Name', 'Age']])

Selecting some rows is also not hard:

# Select the first row of the Age column
print(df['Age'].iloc[0])

# Select the first two rows
print(df.iloc[:2])

Plot The Data

It is fairly easy to make plots of the data using the DataFrame object. For example, it is straightforward to make a histogram out of the ages:

Histogram.

Histogram.

This is created by the following code (which is a piece of a cookie):

df['Age'].hist()

Conclusions (TL;DR)

Pandas is an absolute must-have library for any data science related project. In this tutorial, only the very basic steps of Pandas are covered. Are you interested in more Pandas and do you want to learn advanced Pandas operations? Or do you want to setup your data pipelines in Pandas? Then please check out our new Pandas course!

Help building the Data Blogger Community

Help to grow our community to spread AI and Data Science education around the globe.
Every penny counts.

Kevin Jacobs

I'm Kevin, a Data Scientist, PhD student in NLP and Law and blog writer for Data Blogger. You can reach me via Twitter (@kmjjacobs) or LinkedIn.