Pandas Index & Select Data – 4 Tricks to Solve Any Query

Indexing in pandas is a very crucial function. It lets us select and observe data according to our will and thus allows us to get one step closer to improve our data analysis. Without indexing and selection of data in Pandas, analyzing data would be extremely difficult. With the help of custom indices, we can access our data properly and also manage it efficiently. Pandas Index and select help us to customize our data.

Pandas Indexing and Selecting Data

Pandas Index and Select the Data

Before we start the pandas index and select tutorial, let us import pandas:

>>> import pandas as pd

Then, import a CSV

>>> dataflair_df=pd.read_csv("http://theeventscalendar.com/content/uploads/2014/09/test-data-venues11.csv")
>>> dataflair_df #prints the dataset

Output-

We will get the index of dataframes by .index() function. Let’s take a quick glance of Pandas DataFrames Tutorial

>>> dataflair_df.index

Output-

RangeIndex(start=0, stop=12, step=1)

Index-functions-in-Pandas dataframes

1. Dataframe_name.[]

Where dataframe_name is any name you have selected for your dataframe)- To select a specific column, we will use the dataframe_name.[] function

>>> dataflair_df["VENUE CITY"]
  • To select multiple columns

Follow this code to select multiple columns in pandas dataframes.

>>> dataflair_df[["VENUE NAME","VENUE CITY","VENUE ZIP"]]

Output-

Dataframe_name.

2. Dataframe_name.loc[]

Let’s create our 1st column of the index in Pandas:

>>> dataflair_df=pd.read_csv("http://theeventscalendar.com/content/uploads/2014/09/test-data-venues11.csv", index_col="VENUE NAME")

The “index_col” parameter helps us to select the row to be chosen as an index.

Let’s use the .index function to check the kind of index we are working with now.

>>> dataflair_df.index
  • To select a row with an index name
>>> dataflair_df.loc["Party Haus"]

Output-

select a row with an index name

  • To select multiple rows with dataframe_name.loc[]
>>> dataflair_df.loc[["Party Haus","San Diego Zoo","Krispy Kreme"]]

Output-

To select multiple rows with Index

  • To select 2 rows and find values of only 2 column-values related to it
>>> dataflair_df.loc[["Party Haus","Bamboo Fresh"],["VENUE ZIP","VENUE PHONE"]]

The first list in the above parameters consists of the rows and the second list consists of the columns.

Output-

To select 2 rows with Pandas Index

  • To select all the rows and particular columns
>>> dataflair_df.loc[:,["VENUE CITY","VENUE COUNTRY"]]

Output-

select all the rows and columns by Pandas Index

3. dataframe_name.iloc[]

To select data by index number, we use dataframe_name.iloc[] function.

>>> dataflair_df.iloc[4]

Output-

select data by index number in Pandas

  • To select a range of rows
>>> dataflair_df.iloc[:4]

Output-

To select a range of rows in Pandas

  • The method of selecting more than one column
>>> dataflair_df.iloc[[2,4,6]]

Output-

method of selecting more than one column in Pandas

  • To select both rows and columns
>>> dataflair_df.iloc[[2,3],[5,6]]

The first list contains the Pandas index values of the rows and the second list contains the index values of the columns.

Output-

Pandas select both rows and columns

We can also select all the rows and just a few particular columns.

>>> dataflair_df.iloc[:,[2,4,5]]

Output-

Pandas Select all rows and columns

4. dataframe_name.ix[]

The .ix[] function can be used with both numerical index values and name values. For example:

>>> dataflair_df.ix["Bamboo Fresh"]

Or, we can use

>>> dataflair_df.ix[3]

Output-

dataframe_name ix

Notice how there is also a warning telling us that the use of .ix[] has been deprecated according to the latest version of Pandas and now instead of ix[], we are asked to use .loc[] or .iloc[].

2. Multi-Indexing in Pandas

Through Multi-indexing in Pandas, we can easily access and manipulate data in multiple dimensions, using data structure like DataFrame and Series.

Using the set_index() function we can set up multiple indices.

>>> dataflair_df.set_index(["VENUE CITY","VENUE ADDRESS"], inplace=True)
>>> dataflair_df

Don’t forget to check out the Ultimate Guide for Pandas Series

Now, if we check the kind of index we have, use the .index function:

>>> dataflair_df.index

Output-

Multi-Indexing in Pandas

Thus proving that, there are multiple index layers in the dataframe.

We can sort the dataframe:

>>> dataflair_df.sort_index(inplace=True)
>>> dataflair_df

Output-

Sorting of Dataframes

Here we see that the DataFrame has been sorted in alphabetical order of the first Pandas index.

Summary

In this Pandas tutorial, we have gone through the various functions of Indexing and have also seen Multiindex in action. Now, you can perform indexing and selecting the data with pandas in your own projects.

Comments are the best way to express your experience.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.