Python Data File Formats – How to Read CSV, JSON, and XLS Files
Master Python with 70+ Hands-on Projects and Get Job-ready - Learn Python
1. How to Read CSV, JSON, and XLS Files
In our last python tutorial, we studied How to Work with Relational Database with Python. In this tutorial, we will discuss different types of Python Data File Formats: Python CSV, JSON, and XLS. Moreover, we will discuss how to read CSV, JSON, XLS files in Python Programming Language.
So, let’s see Python Data File Formats.
2. Python Data File Formats
Let’s first learn a little about the Python Data File formats we will be processing with.
a. Python Data File Formats – Python CSV
Python CSV data is a basic with data science. A Comma-Separated-Value file uses commas to separate values. You can look at it as a delimited text file that holds tabular data as plain text. One problem with this may arise when the data it holds contains a comma or a line break- we can use other delimiters like a tab stop. This Python data file format proves useful in exchanging data and in moving tabular data between programs. The extension for a CSV file is .csv.
Follow this link to know about File Handling In Python
Here’s a Python CSV file we will use for our demo-
id,title,timing,genre,rating
1,Dog with a Blog,17:30-18:00,Comedy,4.7
2,Liv and Maddie,18:00-18:30,Comedy,6.3
3,Girl Meets World,18:30-19:00,Comedy,7.2
4,KC Undercover,19:00-19:30,Comedy,6.1
5,Austin and Ally,19:30-20:00,Comedy,6
We saved this as schedule.csv on our Desktop. Remember to save as All files (*.*). When we open this file, it opens in Microsoft Excel by default on Windows-
b. Python Data File Formats – Python JSON
JSON stands for JavaScript Object Notation and is an open standard file format. While it holds attribute-value pairs and array data types, it uses human-readable text for this. This Python data file format is language-independent and we can use it in asynchronous browser-server communication. The extension for a Python JSON file is .json.
Let’s Explore Python Zipfile – Benefits, Modules, Objects
Here’s the JSON file in Python we will use for the demo-
{ "ID":["1","2","3","4","5"], "Title":["Dog with a Blog","Liv and Maddie","Girl Meets World","KC Undercover","Austin and Ally"], "Timing":["17:30-18:00","18:00-18:30","18:30-19:00","19:00-19:30","19:30-20:00"], "Genre":["Comedy","Comedy","Comedy","Comedy","Comedy"], "Rating":["4.7","6.3","7.2","6.1","6"] }
We save this as schedule.json on the Desktop.
c. Python Data File Formats –Â Python XLS
Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!
The extension for an Excel spreadsheet is .xlsx. This proves useful for data science; we create a workbook with two sheets in Microsoft Excel.
Sheet 1-
Sheet 2-
We save this workbook as schedule.xlsx on our Desktop.
Do you Know the XML Processing in Python 3Â
3. Prerequisites
To process these Python data file formats, we need the library pandas.
Install it using pip-
>>> pip install pandas
4. How to Read CSV File in Python?
To read an entire file, rows, columns, or combinations of those, read on.
a. Reading an entire Python CSV File
To read an entire file, we can use the read_csv() function.
>>> import pandas >>> import os >>> os.chdir('C:\\Users\\lifei\\Desktop') >>> print(pandas.read_csv('schedule.csv'))
Read about Python Tools – 4 Major Utilities
b. Reading rows for one column
Sometimes, we may only want to fetch certain rows at once for a certain column. We use the slicing operator for this. Read up on Python Operators.
>>> data=pandas.read_csv('schedule.csv') >>> print(data[0:3]['title'])
c. Reading certain columns
Now we can also fetch only certain columns. For this, we use the .loc() method for multi-axis indexing.
>>> data=pandas.read_csv('schedule.csv') >>> print(data.loc[:,['title','rating']])
d. Reading certain rows and certain columns
Using the .loc() method, we can also fetch certain rows and certain columns:
>>> data=pandas.read_csv('schedule.csv') >>> print(data.loc[[1,3],['title','rating']])
e. Reading certain columns for a range of rows
Now to fetch certain columns for a range of rows, we slightly change the previous syntax and use slicing instead of indices.
data=pandas.read_csv('schedule.csv') >>> print(data.loc[1:3,['title','rating']])
Let’s revise Python Generators and Generator Expressions
f. Using the Python CSV Module
You can also install the module CSV using pip, and then use the following code-
>>> import csv >>> fields=rows=[] >>> with open('schedule.csv','r') as file:                reader=csv.reader(file) #Reader object                fields=next(reader)                for row in reader:                                rows.append(row)                print(f"You have {reader.line_num} rows") #Number of rows              You have 6 rows >>> print('          '.join(field for field in fields)) #Field names id           title       timing  genre   rating >>> for row in rows[:6]:                for col in row:                                print(col)                print('\n')
1
Dog with a Blog
17:30-18:00
Comedy
4.7
2
Liv and Maddie
18:00-18:30
Comedy
6.3
3
Girl Meets World
18:30-19:00
Comedy
7.2
4
KC Undercover
19:00-19:30
Comedy
6.1
5
Austin and Ally
19:30-20:00
Comedy
6
5. How to Read JSON File in Python?
Let’s Discuss Python Decision Making StatementsÂ
a. Reading an entire Python JSON file
To read an entire JSN file in Python, we can use the read_json() function.
>>> import pandas >>> print(pandas.read_csv('schedule.csv'))
b. Reading certain rows and columns
And when you only want some rows and some columns, you can-
>>> data=pandas.read_csv('schedule.csv') >>> data.loc[[1,3],['title','rating']]
c. Reading rows and columns as records
You can also fetch the rows and columns as a sequence of records-
>>> data=pandas.read_csv('schedule.csv') >>> print(data.to_json(orient='records',lines=True))
6. How to Read XLS File in Python?
Finally, let’s find out how to read Python XLS files.
a. Reading an entire file
You can read an entire file using the read_excel() function.
>>> import pandas >>> print(pandas.read_excel('schedule.xlsx'))
Note- You may need to install a module for this-
C:\Users\lifei>pip install xlrd
Have a Look Python Sets and Booleans with Syntax and Examples
b. Reading certain rows and columns
When you only want to fetch certain rows and columns, you can use the .loc() method.
>>> data=pandas.read_excel('schedule.xlsx') >>> print(data.loc[[1,3],['title','rating']])
c. Reading more than one sheet
>>> with pandas.ExcelFile('schedule.xlsx') as book: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â s1=pandas.read_excel(book,'Sheet1') Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â s2=pandas.read_excel(book,'Sheet2')
>>> s1[0:5]['title']
>>> s2[0:5]['years']
d. Using the Python XLRD Module
>>> import xlrd >>> book=xlrd.open_workbook('schedule.xlsx') >>> sheet=book.sheet_by_index(0) >>> sheet.cell_value(0,0) #Value at cell at row 0 and column 0 'id' >>> sheet.nrows #Number of rows 6 >>> sheet.ncols #Number of columns 5 >>> for i in range(sheet.ncols):                sheet.cell_value(0,i) #Column names              'id' 'title' 'timing' 'genre' 'rating' >>> sheet.row_values(1) #First row [1.0, 'Dog with a Blog', '17:30-18:00', 'Comedy', 4.7]
So, this was all about Python Data File Formats. Hope you like our explanation.
7. Conclusion
Hence, in this tutorial, we discussed different types of Python Data File Formats. In addition, we also learned how to process/read formats like CSV, JSON, and XLS files in Python. Leave your comments below and don’t forget to visit again for another tutorial tomorrow.
See Also –Â Recursion in Python with Examples
For reference
If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google
Hi
There is a mistake in in explaining how to read JSON file in Python ? The explanation seems a copy paste of “How to Read .csv file in Python?”
Could you please address it and provide the process to read a JSON file in python?
Appreciate it.
for json read file correct the statement its should be read_json instead of read_csv.
a. Reading an entire Python JSON file
To read an entire JSN file in Python, we can use the read_json() function.
>>> import pandas
>>> print(pandas.read_csv(‘schedule.csv’))
check this section.
Yes, we are updating the same. Thanks
Thanks!