Pandas Index & Select Data – 4 Tricks to Solve Any Query
Get Job-Ready: Data Analysis using Python with 70+ Projects Start Now!!
Indexing in pandas is a very crucial function. It lets us select and observe data according to our will and thus allows us to get one step closer to improve our data analysis. Without indexing and selection of data in Pandas, analyzing data would be extremely difficult. With the help of custom indices, we can access our data properly and also manage it efficiently. Pandas Index and select help us to customize our data.
Pandas Index and Select the Data
Before we start the pandas index and select tutorial, let us import pandas:
>>> import pandas as pd
Then, import a CSV
>>> dataflair_df=pd.read_csv("http://theeventscalendar.com/content/uploads/2014/09/test-data-venues11.csv") >>> dataflair_df #prints the dataset
Output-
We will get the index of dataframes by .index() function. Let’s take a quick glance of Pandas DataFrames Tutorial
>>> dataflair_df.index
Output-
RangeIndex(start=0, stop=12, step=1)
1. Dataframe_name.[]
Where dataframe_name is any name you have selected for your dataframe)- To select a specific column, we will use the dataframe_name.[] function
>>> dataflair_df["VENUE CITY"]
- To select multiple columns
Follow this code to select multiple columns in pandas dataframes.
>>> dataflair_df[["VENUE NAME","VENUE CITY","VENUE ZIP"]]
Output-
2. Dataframe_name.loc[]
Let’s create our 1st column of the index in Pandas:
>>> dataflair_df=pd.read_csv("http://theeventscalendar.com/content/uploads/2014/09/test-data-venues11.csv", index_col="VENUE NAME")
The “index_col” parameter helps us to select the row to be chosen as an index.
Let’s use the .index function to check the kind of index we are working with now.
>>> dataflair_df.index
- To select a row with an index name
>>> dataflair_df.loc["Party Haus"]
Output-
- To select multiple rows with dataframe_name.loc[]
>>> dataflair_df.loc[["Party Haus","San Diego Zoo","Krispy Kreme"]]
Output-
- To select 2 rows and find values of only 2 column-values related to it
>>> dataflair_df.loc[["Party Haus","Bamboo Fresh"],["VENUE ZIP","VENUE PHONE"]]
The first list in the above parameters consists of the rows and the second list consists of the columns.
Output-
- To select all the rows and particular columns
>>> dataflair_df.loc[:,["VENUE CITY","VENUE COUNTRY"]]
Output-
3. dataframe_name.iloc[]
To select data by index number, we use dataframe_name.iloc[] function.
>>> dataflair_df.iloc[4]
Output-
- To select a range of rows
>>> dataflair_df.iloc[:4]
Output-
- The method of selecting more than one column
>>> dataflair_df.iloc[[2,4,6]]
Output-
- To select both rows and columns
>>> dataflair_df.iloc[[2,3],[5,6]]
The first list contains the Pandas index values of the rows and the second list contains the index values of the columns.
Output-
We can also select all the rows and just a few particular columns.
>>> dataflair_df.iloc[:,[2,4,5]]
Output-
4. dataframe_name.ix[]
The .ix[] function can be used with both numerical index values and name values. For example:
>>> dataflair_df.ix["Bamboo Fresh"]
Or, we can use
>>> dataflair_df.ix[3]
Output-
Notice how there is also a warning telling us that the use of .ix[] has been deprecated according to the latest version of Pandas and now instead of ix[], we are asked to use .loc[] or .iloc[].
2. Multi-Indexing in Pandas
Through Multi-indexing in Pandas, we can easily access and manipulate data in multiple dimensions, using data structure like DataFrame and Series.
Using the set_index() function we can set up multiple indices.
>>> dataflair_df.set_index(["VENUE CITY","VENUE ADDRESS"], inplace=True) >>> dataflair_df
Don’t forget to check out the Ultimate Guide for Pandas Series
Now, if we check the kind of index we have, use the .index function:
>>> dataflair_df.index
Output-
Thus proving that, there are multiple index layers in the dataframe.
We can sort the dataframe:
>>> dataflair_df.sort_index(inplace=True) >>> dataflair_df
Output-
Here we see that the DataFrame has been sorted in alphabetical order of the first Pandas index.
Summary
In this Pandas tutorial, we have gone through the various functions of Indexing and have also seen Multiindex in action. Now, you can perform indexing and selecting the data with pandas in your own projects.
Comments are the best way to express your experience.
Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google
The method of selecting more than one column
>>> dataflair_df.iloc[[2,4,6]]
This is not the method to select more than one column. It should be as below
>>> dataflair_df.iloc[:,[2,4,6]]
The site
>>> dataflair_df=pd.read_csv(“http://theeventscalendar.com/content/uploads/2014/09/test-data-venues11.csv”)
do not allow to import the CSV data set at all