Convert PDF to AudioBook and Audio Speech to PDF File using Python

Master Python with 70+ Hands-on Projects and Get Job-ready - Learn Python

Converting a text file to audio reads text aloud to the user. It helps to support struggling students with writing-based knowledge. It helps to proofread, read accurately, understand and write notes. Let’s start developing this very popular python project to convert pdf file to audio speech.

What is a Python Pdf to Audio converter?

PDF to audio converter means it converts the text into speech.

In this project, we have created the TextToSpeech function which converts PDF text files into audio. We can choose any text file from our directory and we can convert it into audio.

What is Python Audio to Pdf File Converter?

Audio to pdf text converter means it converts the audio into the text. In this project, we have created a SpeechToText function that converts audio into text. After clicking on the listen button the speaker can speak anything and the SpeechToText function will convert that audio into text and it will display on the textbox.

Project to Convert Pdf file to audio using Python

In this project, we have created a GUI-based converter that converts text into audio and vice versa using tkinter, speech recognition and os libraries, and the messagebox module of the Tkinter library. Users can choose any pdf/book that he/she wants. After selecting a particular pdf user clicks on the get audio button so the user is able to hear all the content in the pdf, that’s how we can convert text into audio.

For converting audio into the text we have to click on the listen button so the speaker can speak anything and the SpeechToText function will convert that audio into text and it will display on the textbox.

Python Pdf file to speech converter Project Prerequisite

This project requires good knowledge of python, Tkinter, and speech recognition library. Tkinter is the python binding to the Tk toolkit which is used across many programming languages for building the Graphical user interface which is a GUI. Speech recognition library transcripts speech into the audio file. Also used for text to speech conversion using Windows API. Also need to know about the tkinter message box which is used to display the message box on the python game window.

Download Pdf file to audio converter Python Project

Please download the source code of the Python Pdf file to audio converter: Convert pdf to audio (& vice versa) using Python

Steps to Build Python Pdf to Audio Converter Project

Below are the list of steps to convert Pdf text to audio speech and audio to pdf file using python:

  1. Import Modules
  2. Make constructor of the Application class
  3. Functions to draw main frame
  4. Function deleting frame
  5. Speech to Text function
  6. Text to speech function
  7. Conversion functions
  8. Function for reading and clearing files.
  9. Main function

Step 1- Importing Modules.

#DataFlair - import library
import os
import tkinter as tk
from tkinter import filedialog
from tkinter import messagebox
import speech_recognition as sr  
from win32com.client import constants, Dispatch

Code Explanation-

  • os – This module interacts with the operating system.
  • Tkinter module – Tkinter is the standard interface in python which is used for creating a graphical user interface.
  • from tkinter import messagebox – Import message box separately for showing messages on the screen.
  • speech_recognition – This library converts speech into audio file.
  • win32com.client – win 32com.client module provides access to automation objects.

Step 2- Make constructor of Application class

class Application(tk.Frame):
    def __init__(self, master=None):
        super().__init__(master=master)
        self.master = master
        self.pack()
        self.Main_Frame()

Code Explanation-

  • Application() – Application class includes all the functions of the tkinter frame.
  • init – Application class constructor.
  • super() – This function is used to give access to methods and properties of the parent class.
  • .master – master parameter is used to pass a new instance of the application class when it is initialized.
  • .pack – It declares the position of widgets in relation to others.
  • Calling main_frame() function.

Step 3- Functions to draw main frame

def Main_Frame(self):
        self.Delete_Frame()
 
        self.Frame_1 = tk.Frame(self)
        self.Frame_1.config(width=400, height=100)
        self.Frame_1.grid(row=0, column=0, columnspan=2)
 
        self.Label_1 = tk.Label(self.Frame_1)
        self.Label_1['text'] = 'Convert PDF File Text to Audio Speech and vice versa using Python'
        self.Label_1.grid(row=0, column=0, pady=30)
 
        self.Label_2 = tk.Label(self.Frame_1)
        self.Label_2['text'] = 'Requires an Active Internet Connection'
        self.Label_2.grid(row=1, column=0, pady=10, padx=100)
 
        self.SpeehToText = tk.Button(self, bg='#e8c1c7', fg='black',font=("Times new roman", 14, 'bold'))
        self.SpeehToText['text'] = 'Speech to Text'
        self.SpeehToText['command'] = self.SpeechToText
        self.SpeehToText.grid(row=1, column=0, pady=80, padx=60)
 
        self.TextTo_Speech = tk.Button(self, bg='#e8c1c7', fg='black',font=("Times new roman", 14, 'bold'))
        self.TextTo_Speech['text'] = 'Text to Speech'
        self.TextTo_Speech['command'] = self.TextToSpeech
        self.TextTo_Speech.grid(row=1, column=1, pady=60, padx=60)

Code Explanation-

  • Main_Frame() – Function for creating main frame.
  • Frame_1 – Variable for creating frame 1.
  • Setting grid and width height of frame 1.
  • Create two labels: Label1 and label2 set their text and grid padding.
  • Create two buttons: Speech to Text and Text to Speech for Converting Speech into text and vice versa.
  • And setting their grid, background color and font.

Step 4- Function deleting frame

def Delete_Frame(self):
        for widgets in self.winfo_children():
            widgets.destroy()

Code Explanation-

  • Delete_Frame() – Function for deleting frame.
  • winfo_children() – This is a method from the tkinter module which is used to get a list of all child widgets.
  • destroy() – Method use for destroying widgets.

Step 5- Speech to Text function

def SpeechToText(self):
        self.Delete_Frame()
 
        self.Listen = tk.Button(self, bg='#e8c1c7', fg='black',font=("Times new roman", 18, 'bold'))
        self.Listen['text'] = 'Listen'
        self.Listen['command'] = self.Audio_Recognizer
        self.Listen.grid(row=0, column=0, pady=40)
 
        self.Back = tk.Button(self, bg='red', fg='black',font=("Times new roman", 18, 'bold'))
        self.Back['text'] = ' ← '
        self.Back['command'] = self.Main_Frame
        self.Back.grid(row=0, column=2)
 
        self.text = tk.Text(self)
        self.text.configure(width=48, height=10)
        self.text.grid(row=1, column=0, columnspan=3)

Code Explanation-

  • Speech To Text() – Function for converting Speech into text.
  • Here we have made the listen button and set their background color, font color and font. Also given command to the listen button audio recognizer.
  • After clicking on the listen button we have to speak and this function will convert our speech into text.
  • Back – We have created a back button to go back to the main frame.
  • In the text box we can see the text of the audio that we have spoken. Also set the configuration of the textbox and given the width and height to the textbox.

Step 6- Text to speech function

def TextToSpeech(self):
        self.Delete_Frame()
 
        self.scroll = tk.Scrollbar(self, orient = tk.VERTICAL)
        self.scroll.grid(row=0, column=4, sticky='ns', padx=0)
 
        self.text = tk.Text(self)
        self.text.configure(width=44, height=12)
        self.text.grid(row=0, column=0, columnspan=3)
 
        self.text.config(yscrollcommand=self.scroll.set)
        self.scroll.config(command = self.text.yview)
 
        self.GET_Audio = tk.Button(self, bg='#e8c1c7', fg='black', font=("Times new roman", 17, 'bold'))
        self.GET_Audio['text'] = 'Get Audio'
        self.GET_Audio['command'] = self.Convert_TextToSpeech
        self.GET_Audio.grid(row=1, column=0, pady=50)
 
        self.read_file = tk.Button(self, bg='#e8c1c7', fg='black', font=("Times new roman", 17, 'bold'))
        self.read_file['text'] = 'Read file'
        self.read_file['command'] = self.Read_File
        self.read_file.grid(row=1, column=1)
 
        self.Clear_Frame = tk.Button(self, bg='#e8c1c7', fg='black', font=("Times new roman", 17, 'bold'))
        self.Clear_Frame['text'] = 'Clear'
        self.Clear_Frame['command'] = self.Clear_TextBook
        self.Clear_Frame.grid(row=1, column=2)
 
        self.Back = tk.Button(self, bg='red', fg='black',font=("Times new roman", 17, 'bold'))
        self.Back['text'] = ' <-- '
        self.Back['command'] = self.Main_Frame
        self.Back.grid(row=1, column=3)

Code Explanation-

  • TextToSpeech() – Function for converting text into speech.
  • scroll – The scrollbar controls the up and down movement of the pdf.
  • In the text box we have to write the text or import pdf or txt file from the directory and this function will convert that text into speech. Also set the configuration of the textbox and give the width and height to the textbox.
  • We have created a get audio button for listening to audio. Setting background color and font to the button.
  • Get audio variable for listening to the audio from the text. By clicking the get audio button we can able to hear all the text that is in the text box.
  • We have created a Read File button for reading a file that we have selected from our directory. Setting background color and font to the button.
  • We created a Clear frame button for clearing everything on the frame.
  • Back – We have created a back button to go back to the main frame.

Step 7- Conversion functions

def Audio_Recognizer(self):
        self.Clear_TextBook()
        try:
            with mic as source:
                Audio = r.Listen(source)
 
            msg = r.recognize_google(Audio)
            self.text.insert('1.0', msg)
        except:
            self.text.insert('1.0', 'No internet connection')
 
    def Convert_TextToSpeech(self):
        self.msg = self.text.get(1.0, tk.END)
        if self.msg.strip('\n') != '':
            speaker.speak(self.msg)
        else:
            speaker.speak('Write some message first')

Code Explanation-

  • Audio_Recognizer() – Function for recognition of the audio.
  • Here we have used the try and except block for creating the audio recognizer function. In the try block we have to use the mic as a source for speaking. It will listen to the speaker’s audio.
  • Otherwise except block will show the message that no internet connection is available.
  • Convert_TextToSpeech() – Function for converting text into speech.
  • Here we have to use the if else condition. In the if loop we have to select the file or else we don’t select any file it will show the message that please write text here.

Step 8- Function for reading and clearing files.

def Read_File(self):
        self.filename = filedialog.askopenfilename(initialdir=Working_Dir)
 
        if (self.filename == '') or (not self.filename.endswith('.txt')):
            messagebox.showerror('Can't load file', 'Choose a text file to read')
        else:
            with open(self.filename) as f:
                text = f.read()
                self.Clear_TextBook()
                self.text.insert('1.0', text)
 
    def Clear_TextBook(self):
        self.text.delete(1.0, tk.END)

Code Explanation

  • Read_File() – Function for reading file.
  • filename – This variable stores a file directory. Filename is used to uniquely identify a computer file in the directory structure.
  • If we have not chosen any file it will show a message- Can not load file, Choose a text file to read.
  • Clear_TextBook() – Function for clearing the pdf screen.

Step 9- Main function

root = tk.Tk()
root.geometry('500x300')
root.wm_title('Speech to Text and Text to Speech converter by DataFlair')
 
app = Application(master=root)
app['bg'] = '#e3f4f1'
app.mainloop()

Code Explanation

  • We initialize the main window of the project.
  • tk – Initializing the tkinter window of Python speech to text conversion Project.
  • .title – Use to set title to window.
  • .geometry – For setting dimensions of a window in pixels.
  • app[‘bg’] – for setting background color.

Python Pdf to Audio Output

python pdf to audio output

Summary

Congratulations Friends!!

We have successfully created our python project which converts pdf text into audio and audio speech to pdf file using Tkinter, Threading, math, fitz, and pyttsx3 modules.

If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google

courses

DataFlair Team

DataFlair Team is a group of passionate educators and industry experts dedicated to providing high-quality online learning resources on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. With years of experience in the field, the team aims to simplify complex topics and help learners advance their careers. At DataFlair, we believe in empowering students and professionals with the knowledge and skills needed to thrive in today’s fast-paced tech industry. Follow us for Free courses, expert insights, tutorials, and practical tips to boost your learning journey.

Leave a Reply

Your email address will not be published. Required fields are marked *