Site icon DataFlair

Python Pandas Tutorial – Learn Pandas in Python

Master Python with 70+ Hands-on Projects and Get Job-ready - Learn Python

In our last Python Library tutorial, we discussed Python Scipy. Today, we will look at a Python Pandas Tutorial. In this Pandas tutorial, we will learn the exact meaning of Pandas in Python. Moreover, we will see the features, installation, and dataset in Pandas. Along with this, we will discuss Pandas data frames and how to manipulate the dataset in Python Pandas. Also, we will discuss Pandas examples and some terms, such as ranking, series, and panels.

So, let’s start the Python Pandas Tutorial.

Python Pandas Tutorial 2018 | Learn Pandas in Python

What is Pandas in Python?

As discussed above, you can use pandas to manipulate and analyze data. With the data structures and operations it has to offer, you can play around with time series and numerical tables.

Python Pandas Tutorial

Let’s take a look at some bullet points about this-

Pandas is under a three-clause BSD license and is free to download, use, and distribute. Etymologically, the term is a portmanteau of the words “panel” and “data”. What this means is that you need to supervise data sets multiple times for one individual.
Do you know about Python Multiple Inheritance?

Benefits of using pandas:

Python Pandas Features

Here, in this Python pandas Tutorial, we are discussing some Pandas features:

With Python Pandas, it is easier to clean and wrangle your data. Pandas Features like these make it a great choice for data science and analysis. Using it with libraries like NumPy and Matplotlib makes it all the more useful.
Do you know about NumPy, a Python Library

How to Install Pandas?

Below are the steps to install Pandas in Python:

a. Installing Pandas

To install pandas, you can use pip-

pip install pandas

b. Importing Pandas

Now let’s import this using an alias-

>>>import pandas as pd

This lets us enjoy the liberty of mentioning pandas as pd.

c. Importing a Dataset

You can use the function read_csv() to make it read a CSV file. Let’s import the furniture dataset.
Let’s discuss Python File Format

>>> furniture=pd.read_csv('furniture.csv')
>>> furniture

Python Pandas Tutorial – Importing a Dataset in Pandas

Dataset in Pandas

Following are the Pandas datasets, let’s discuss them in detail:

a. Column names

The following command will give us all the column names-

>>> furniture.columns

Index([‘Unnamed: 0’, ‘Product’, ‘Brand’, ‘Cost’], dtype=’object’)
We can slice it-

>>> furniture.columns[0:2]

Index([‘Unnamed: 0’, ‘Product’], dtype=’object’)

b. Data types

>>> furniture.dtypes

Unnamed: 0     int64
Product       object
Brand         object
Cost           int64
dtype: object
Read Python namedtuple 
To find out more about data types, read up on NumPy with Python. Let’s find out the data types of one column.

>>> furniture['Brand'].dtypes

dtype(‘O’)
O denotes an object.

c. Shape

To find out what shape your data set is, you can use the shape tuple-

>>> furniture.shape

(5, 4)
Number of rows-

>>> furniture.shape[0]

5
Number of columns-

>>> furniture.shape[1]

4

d. Individual rows

The head() method will give us the first 5 rows of the data set, but we can also choose to print fewer or more.

>>> furniture.head(3)

Python Pandas Tutorial – Individual rows

>> furniture.tail(2)

Python Pandas Tutorial – Individual rows

e. Unique values

We can use the unique() function when we want to see what categories in the data set are unique.
Let’s discuss Python Defaultdict

>>> furniture.index.unique()

Int64Index([0, 1, 2, 3, 4], dtype=’int64′)
And to find out how many, we make a call to nunique().

>>> furniture.index.nunique()

5

DataFrames in Pandas

A DataFrame is an essential data structure with pandas. It lets us deal with data in a tabular fashion. The rows are observations, and the columns are variables.

We have the following syntax for this-

pandas.DataFrame( data, index, columns, dtype, copy)

Such a data structure is-

a. Creating a DataFrame

Let’s see how we can create a DataFrame.

>>> df=pd.DataFrame({'company':['Amazon','Apple','Google','Facebook','Microsoft'],
    'CEO':['Jeff Bezos','Tim Cook','Sundar Pichai','Mark Zuckerberg','Satya Nadella'],
    'Founded':[1994,1976,1998,2004,1975]})
>>> df

Python Pandas Tutorial – Creating a DataFrame

b. Setting Indexes for a DataFrame

Now this indexes the dataframe as integers starting at 0. But we can put labels on these. Let’s see how we can index it based on which company came first.

>>> df.index=['Third','Second','Fourth','Fifth','First']
>>> df

Python Pandas Tutorial – Setting Indexes for a DataFrame

c. Indexing a DataFrame

A column-
Let’s learn about Python collections

>>> df['company']

Third         Amazon
Second         Apple
Fourth        Google
Fifth       Facebook
First      Microsoft
Name: company, dtype: object
This prints out a Series. Now, to print out a DataFrame, we can:

>>> df[['company']]

          company
Third       Amazon
Second       Apple
Fourth      Google
Fifth     Facebook
First    Microsoft

>>> df[['company','Founded']]

Indexing a DataFrame

d. Slicing a DataFrame

It is possible to slice a DataFrame to retrieve rows from it.

>>> df[0:3]

Python Pandas Tutorial – Slicing a DataFrame

e. More data selection operations

Using loc and iloc, you can select certain rows in a data set. loc uses string indices; iloc uses integers.

>>> df.loc[['Second','Fifth']]

Python Pandas Tutorial – More data selection operations

>>> df.iloc[3]

Pandas Tutorial

Getting more than one column-

>>> df.iloc[:,1:4]

Python Pandas Tutorial

Manipulating the Datasets in Pandas

So far, we’ve seen how we can find out more about a dataset (and also, how to set indexes to it, okay). Now let’s see what we can do to it.
Let’s explore Python Jobs

a. Changing the data type

Let’s use the furniture dataset for this.

>>> furniture.Cost=furniture.Cost.astype(float)
>>> furniture

Python Pandas Tutorial – Changing the data type

b. Creating a frequency distribution

For this purpose, we have the method value_counts().

>>> furniture.index=['A','B','A','A','C']
>>> furniture.index.value_counts(ascending=True)

C    1
B    1
A    3
dtype: int64

c. Creating a crosstab

A crosstab creates a bivariate frequency distribution.
Learn more about Python read & write files

>>> pd.crosstab(furniture.index,furniture.Brand)

Python Pandas Tutorial – Creating a crosstab

d. Choosing one column as an index

You can choose one of the columns in your dataset to index others.

>>> df.set_index('company',inplace=True)
>>> df

Pandas Tutorial -Choosing one column as index

To reset this, you can:

>>> df.reset_index(inplace=True)
>>> df

Pandas Tutorial – Choosing one column as index

e. Sorting data

For this, we use the function sort_values().

>>> furniture.sort_values('Cost',ascending=False)

Python Pandas Tutorial – Sorting data

f. Renaming variables

Let’s rename the variable ‘company’ to ‘Company’.

>>> df.columns=['Company','CEO','Founded']
>>> df

Python Pandas Tutorial – Renaming variables

Or we can:
Do you know about Python Data Science

>>> furniture.rename(columns={'Product':'Category'},inplace=True)
>>> furniture

Renaming variables

g. Dropping rows and columns

It is possible to drop any number of rows and columns you want.

>>> furniture.drop('Cost',axis=1)

Dropping rows and columns

h. Creating new variables

Now, let’s add 10% of the cost to itself and find out the gross amount.

>>> furniture['Gross']=furniture.eval('Cost+(Cost*(0.1))')
>>> furniture

Creating New Variables

Describing a Dataset in Pandas

Here, with the describe() method, we can find out information about a dataset- min, max, mean, count, and more.

>>> furniture.describe()

Pandas Tutorial – Describing a Dataset

>>> furniture.Gross.max()

55000.0

Pandas groupby Function

Generally, this operation lets you group data on a variable.

>>> furniture.groupby('Category').Gross.min()

Pandas Tutorial – groupby Function

agg() lets us find out different values like count and min.
Have a look at Python Modules vs packages

>>> furniture.groupby('Category').Gross.agg(['count','min','max','mean'])

Group by function in pandas

Filtering in Python Pandas

Now, you can perform filtering in two ways-

>>> furniture[furniture.index==2]

Python Pandas – Filtering

>>> furniture.loc[furniture.index==2,:]

And then of course, you can group conditions. Or:

>>> furniture[furniture.index.isin([1,3])]

Filtering in Groupby

Missing Values in Pandas

Basically, isnull() will tell her if a column misses a value or more.

>>> furniture.isnull()

Missing Values in Pandas

Similarly, notnull() returns False for a NaN.
Number of missing values-

>>> furniture.isnull().sum()

Missing Values in Pandas

To drop a missing value, you can use dropna(), and to fill it, use fillna().
Learn about Python regular expressions

Ranking in Python Pandas

Now, to rank every variable according to its value, we can use rank().

>>> furniture.rank()

Python Pandas – Ranking

13. Python Pandas Tutorial – Concatenating DataFrames

So, with the concat() method, we can concatenate two or more DataFrames.

>>> pd.concat([df,furniture])

Python Pandas – Concatenating DataFrames

Let’s see what happens when we concatenate this with df.

>>> pd.concat([df,furniture,df])

Concatenating DataFrames in Pandas

Series in Pandas

Now, another important data structure in pandas is a Series. This is a one-dimensional array; it is labeled and can hold more than one kind of data.

>>> pd.Series([2,4,'c'])

0     2
1     4
2     c
dtype: object

>>> pd.Series({1:'a',2:'b'})

1     a
2     b
dtype: object
Read Python packages

Panels in Pandas

Finally, we come to panels. A panel holds data in 3 dimensions. As we said above, the term ‘pandas’ comes as a portmanteau of the words “panel” and “data”. A declaration for a panel takes in three parameters- items, major_axis, and minor_axis.

>>> import numpy as np
>>> pd.Panel(np.random.rand(2,4,5))
<class 'pandas.core.panel.Panel'>

Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4

So, this was all in the Python pandas Tutorial. Hope you like our explanation.

Conclusion

Hence, in this Python Pandas Tutorial, we learn Pandas in Python. Moreover, we discussed Pandas example, features, installation, and data sets. Also, we saw Data frames and the manipulation of data sets. Still, if any doubt regarding Pandas in Python, ask in the comments tab.

See also – 
Python Interpreter
For reference

Exit mobile version