Convert PDF to AudioBook and Audio Speech to PDF File using Python
Master Python with 70+ Hands-on Projects and Get Job-ready - Learn Python
Converting a text file to audio reads text aloud to the user. It helps to support struggling students with writing-based knowledge. It helps to proofread, read accurately, understand and write notes. Let’s start developing this very popular python project to convert pdf file to audio speech.
What is a Python Pdf to Audio converter?
PDF to audio converter means it converts the text into speech.
In this project, we have created the TextToSpeech function which converts PDF text files into audio. We can choose any text file from our directory and we can convert it into audio.
What is Python Audio to Pdf File Converter?
Audio to pdf text converter means it converts the audio into the text. In this project, we have created a SpeechToText function that converts audio into text. After clicking on the listen button the speaker can speak anything and the SpeechToText function will convert that audio into text and it will display on the textbox.
Project to Convert Pdf file to audio using Python
In this project, we have created a GUI-based converter that converts text into audio and vice versa using tkinter, speech recognition and os libraries, and the messagebox module of the Tkinter library. Users can choose any pdf/book that he/she wants. After selecting a particular pdf user clicks on the get audio button so the user is able to hear all the content in the pdf, that’s how we can convert text into audio.
For converting audio into the text we have to click on the listen button so the speaker can speak anything and the SpeechToText function will convert that audio into text and it will display on the textbox.
Python Pdf file to speech converter Project Prerequisite
This project requires good knowledge of python, Tkinter, and speech recognition library. Tkinter is the python binding to the Tk toolkit which is used across many programming languages for building the Graphical user interface which is a GUI. Speech recognition library transcripts speech into the audio file. Also used for text to speech conversion using Windows API. Also need to know about the tkinter message box which is used to display the message box on the python game window.
Download Pdf file to audio converter Python Project
Please download the source code of the Python Pdf file to audio converter: Convert pdf to audio (& vice versa) using Python
Steps to Build Python Pdf to Audio Converter Project
Below are the list of steps to convert Pdf text to audio speech and audio to pdf file using python:
- Import Modules
- Make constructor of the Application class
- Functions to draw main frame
- Function deleting frame
- Speech to Text function
- Text to speech function
- Conversion functions
- Function for reading and clearing files.
- Main function
Step 1- Importing Modules.
#DataFlair - import library import os import tkinter as tk from tkinter import filedialog from tkinter import messagebox import speech_recognition as sr from win32com.client import constants, Dispatch
Code Explanation-
- os – This module interacts with the operating system.
- Tkinter module – Tkinter is the standard interface in python which is used for creating a graphical user interface.
- from tkinter import messagebox – Import message box separately for showing messages on the screen.
- speech_recognition – This library converts speech into audio file.
- win32com.client – win 32com.client module provides access to automation objects.
Step 2- Make constructor of Application class
class Application(tk.Frame): def __init__(self, master=None): super().__init__(master=master) self.master = master self.pack() self.Main_Frame()
Code Explanation-
- Application() – Application class includes all the functions of the tkinter frame.
- init – Application class constructor.
- super() – This function is used to give access to methods and properties of the parent class.
- .master – master parameter is used to pass a new instance of the application class when it is initialized.
- .pack – It declares the position of widgets in relation to others.
- Calling main_frame() function.
Step 3- Functions to draw main frame
def Main_Frame(self): self.Delete_Frame() self.Frame_1 = tk.Frame(self) self.Frame_1.config(width=400, height=100) self.Frame_1.grid(row=0, column=0, columnspan=2) self.Label_1 = tk.Label(self.Frame_1) self.Label_1['text'] = 'Convert PDF File Text to Audio Speech and vice versa using Python' self.Label_1.grid(row=0, column=0, pady=30) self.Label_2 = tk.Label(self.Frame_1) self.Label_2['text'] = 'Requires an Active Internet Connection' self.Label_2.grid(row=1, column=0, pady=10, padx=100) self.SpeehToText = tk.Button(self, bg='#e8c1c7', fg='black',font=("Times new roman", 14, 'bold')) self.SpeehToText['text'] = 'Speech to Text' self.SpeehToText['command'] = self.SpeechToText self.SpeehToText.grid(row=1, column=0, pady=80, padx=60) self.TextTo_Speech = tk.Button(self, bg='#e8c1c7', fg='black',font=("Times new roman", 14, 'bold')) self.TextTo_Speech['text'] = 'Text to Speech' self.TextTo_Speech['command'] = self.TextToSpeech self.TextTo_Speech.grid(row=1, column=1, pady=60, padx=60)
Code Explanation-
- Main_Frame() – Function for creating main frame.
- Frame_1 – Variable for creating frame 1.
- Setting grid and width height of frame 1.
- Create two labels: Label1 and label2 set their text and grid padding.
- Create two buttons: Speech to Text and Text to Speech for Converting Speech into text and vice versa.
- And setting their grid, background color and font.
Step 4- Function deleting frame
def Delete_Frame(self): for widgets in self.winfo_children(): widgets.destroy()
Code Explanation-
- Delete_Frame() – Function for deleting frame.
- winfo_children() – This is a method from the tkinter module which is used to get a list of all child widgets.
- destroy() – Method use for destroying widgets.
Step 5- Speech to Text function
def SpeechToText(self): self.Delete_Frame() self.Listen = tk.Button(self, bg='#e8c1c7', fg='black',font=("Times new roman", 18, 'bold')) self.Listen['text'] = 'Listen' self.Listen['command'] = self.Audio_Recognizer self.Listen.grid(row=0, column=0, pady=40) self.Back = tk.Button(self, bg='red', fg='black',font=("Times new roman", 18, 'bold')) self.Back['text'] = ' ← ' self.Back['command'] = self.Main_Frame self.Back.grid(row=0, column=2) self.text = tk.Text(self) self.text.configure(width=48, height=10) self.text.grid(row=1, column=0, columnspan=3)
Code Explanation-
- Speech To Text() – Function for converting Speech into text.
- Here we have made the listen button and set their background color, font color and font. Also given command to the listen button audio recognizer.
- After clicking on the listen button we have to speak and this function will convert our speech into text.
- Back – We have created a back button to go back to the main frame.
- In the text box we can see the text of the audio that we have spoken. Also set the configuration of the textbox and given the width and height to the textbox.
Step 6- Text to speech function
def TextToSpeech(self): self.Delete_Frame() self.scroll = tk.Scrollbar(self, orient = tk.VERTICAL) self.scroll.grid(row=0, column=4, sticky='ns', padx=0) self.text = tk.Text(self) self.text.configure(width=44, height=12) self.text.grid(row=0, column=0, columnspan=3) self.text.config(yscrollcommand=self.scroll.set) self.scroll.config(command = self.text.yview) self.GET_Audio = tk.Button(self, bg='#e8c1c7', fg='black', font=("Times new roman", 17, 'bold')) self.GET_Audio['text'] = 'Get Audio' self.GET_Audio['command'] = self.Convert_TextToSpeech self.GET_Audio.grid(row=1, column=0, pady=50) self.read_file = tk.Button(self, bg='#e8c1c7', fg='black', font=("Times new roman", 17, 'bold')) self.read_file['text'] = 'Read file' self.read_file['command'] = self.Read_File self.read_file.grid(row=1, column=1) self.Clear_Frame = tk.Button(self, bg='#e8c1c7', fg='black', font=("Times new roman", 17, 'bold')) self.Clear_Frame['text'] = 'Clear' self.Clear_Frame['command'] = self.Clear_TextBook self.Clear_Frame.grid(row=1, column=2) self.Back = tk.Button(self, bg='red', fg='black',font=("Times new roman", 17, 'bold')) self.Back['text'] = ' <-- ' self.Back['command'] = self.Main_Frame self.Back.grid(row=1, column=3)
Code Explanation-
- TextToSpeech() – Function for converting text into speech.
- scroll – The scrollbar controls the up and down movement of the pdf.
- In the text box we have to write the text or import pdf or txt file from the directory and this function will convert that text into speech. Also set the configuration of the textbox and give the width and height to the textbox.
- We have created a get audio button for listening to audio. Setting background color and font to the button.
- Get audio variable for listening to the audio from the text. By clicking the get audio button we can able to hear all the text that is in the text box.
- We have created a Read File button for reading a file that we have selected from our directory. Setting background color and font to the button.
- We created a Clear frame button for clearing everything on the frame.
- Back – We have created a back button to go back to the main frame.
Step 7- Conversion functions
def Audio_Recognizer(self): self.Clear_TextBook() try: with mic as source: Audio = r.Listen(source) msg = r.recognize_google(Audio) self.text.insert('1.0', msg) except: self.text.insert('1.0', 'No internet connection') def Convert_TextToSpeech(self): self.msg = self.text.get(1.0, tk.END) if self.msg.strip('\n') != '': speaker.speak(self.msg) else: speaker.speak('Write some message first')
Code Explanation-
- Audio_Recognizer() – Function for recognition of the audio.
- Here we have used the try and except block for creating the audio recognizer function. In the try block we have to use the mic as a source for speaking. It will listen to the speaker’s audio.
- Otherwise except block will show the message that no internet connection is available.
- Convert_TextToSpeech() – Function for converting text into speech.
- Here we have to use the if else condition. In the if loop we have to select the file or else we don’t select any file it will show the message that please write text here.
Step 8- Function for reading and clearing files.
def Read_File(self): self.filename = filedialog.askopenfilename(initialdir=Working_Dir) if (self.filename == '') or (not self.filename.endswith('.txt')): messagebox.showerror('Can't load file', 'Choose a text file to read') else: with open(self.filename) as f: text = f.read() self.Clear_TextBook() self.text.insert('1.0', text) def Clear_TextBook(self): self.text.delete(1.0, tk.END)
Code Explanation
- Read_File() – Function for reading file.
- filename – This variable stores a file directory. Filename is used to uniquely identify a computer file in the directory structure.
- If we have not chosen any file it will show a message- Can not load file, Choose a text file to read.
- Clear_TextBook() – Function for clearing the pdf screen.
Step 9- Main function
root = tk.Tk() root.geometry('500x300') root.wm_title('Speech to Text and Text to Speech converter by DataFlair') app = Application(master=root) app['bg'] = '#e3f4f1' app.mainloop()
Code Explanation
- We initialize the main window of the project.
- tk – Initializing the tkinter window of Python speech to text conversion Project.
- .title – Use to set title to window.
- .geometry – For setting dimensions of a window in pixels.
- app[‘bg’] – for setting background color.
Python Pdf to Audio Output
Summary
Congratulations Friends!!
We have successfully created our python project which converts pdf text into audio and audio speech to pdf file using Tkinter, Threading, math, fitz, and pyttsx3 modules.
If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google