Python Speech Recognition – Artificial Intelligence
Free Machine Learning courses with 130+ real-time projects Start Now!!
Master Python with 70+ Hands-on Projects and Get Job-ready - Learn Python
Welcome to our Python Speech Recognition Tutorial. In this tutorial of AI with Python Speech Recognition, we will learn to read an audio file with Python. We will make use of the speech recognition API to perform this task. Moreover, we will discuss reading a segment and dealing with noise.
So, let’s start the Python Speech recognition Tutorial.
What is Python Speech Recognition?
From systems facilitating single speakers and limited vocabularies of around a dozen words, to systems that recognize from multiple speakers and possess huge vocabularies in various languages, we have come a long way.
What we do here is- we convert speech from physical sound to electrical signals using a microphone. Then, we use an analogue-to-digital converter to convert this to digital data.
Finally, we use multiple models to transcribe audio to text. In the Hidden Markov Model (HMM), we divide the speech signal into 10-millisecond fragments.
a. Available APIs in Python Speech Recognition
With Python, we have several APIs available:
- apiai
- assemblyai
- google-cloud-speech
- pocketsphinx
- SpeechRecognition
- watson-developer-cloud
- wit
Some Python packages like wit and apiai offer more than just basic speech recognition. Here, though, we will demonstrate SpeechRecognition, which is easier to use. This hard-codes a default API key for the Google Web Speech API.
b. Supported File Types in Python Speech Recognition
- WAV- PCM/LPCM format
- AIFF
- AIFF-C
- FLAC
c. Prerequisites for Python Speech Recognition
You can use pip to install this-
Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!
pip install SpeechRecognition
To test the installation, you can import this in the interpreter and check the version-
>>> import speech_recognition as sr >>> sr.__version__
‘3.8.1’
We also download a sample audio from here-
http://www.voiptroubleshooter.com/open_speech/american.html
Reading an Audio File in Python
a. The Recognizer class
First, we make an instance of the Recognizer class.
>>> r=sr.Recognizer()
With Recognizer, we have a method for each API-
- recognize_bing()- Microsoft Bing Speech
- recognize_google()- Google Web Speech API
- recognize_google_cloud()- Google Cloud Speech
- recognize_houndify()- Houndify
- recognize_ibm()- IBM Speech to Text
- recognize_sphinx- CMU Sphinx
- recognize_wit()- Wit.ai
Exempting recognize_sphinx(), you need an Internet connection for anything else you’re working with.
b. Capturing data with record()
We can have the context manager open the file and read its contents, then record it into an AudioData instance.
>>> demo=sr.AudioFile('demo.wav') >>> with demo as source: audio=r.record(source)
To confirm this, try:
>>> type(audio)
<class ‘speech_recognition.AudioData’>
c. Recognizing Speech in the Audio
Finally, you can call recognize_google() to perform the transcription.
>>> r.recognize_google(audio)
“The Purge can use within The Smurfs the sheet without playback Mount delivery date habitat of a Vow these days it’s okay microwave devices are installed in Windows to use of lemons next find the password on the site that the houses such hard core in a garbage for the study core exercises talking is hard disk”
Well, you can read audio of a different language using the language parameter-
r.recognize_google(audio,language='ro-RO') #for Romanian
Reading a Segment of Audio
When you only want to read a part of your audio file, you can use the arguments offset– telling it where to begin (in seconds), and duration– telling it how long to listen.
>>> with demo as source: audio=r.record(source,offset=4,duration=3) >>> r.recognize_google(audio)
‘clear the sheet without me back’
Note that this caused issues at the extremes. It heard ‘murfs’, which it translated to ‘clear’. It also heard ‘me back’ instead of ‘playback’ because of the noise in the audio.
If we set the offset to 3.3,
>>> with demo as source: audio=r.record(source,offset=3.3,duration=3) >>> r.recognize_google(audio)
‘clear the sheet with Ok’
But check what happens when we set the offset to 2.5-
>>> with demo as source: audio=r.record(source,offset=2.5,duration=3) >>> r.recognize_google(audio)
‘National thanks’
Python Speech Recognition – Dealing with Noise
Okay, let’s face it. There will always be noise, no matter how professional appliances you use to record your audio. So let’s better learn to deal with it.
The method adjust_for_ambient_noise() reads the first second of a file stream to calibrate the recognizer to the audio’s noise level. This often consumes that part of the audio, and it doesn’t make it to the transcription.
>>> with demo as source: r.adjust_for_ambient_noise(source) audio=r.record(source,offset=2.5,duration=3) >>> r.recognize_google(audio)
‘clear the sheet’
We can provide this an argument for how long it should listen for noise so it can calibrate the recognizer. Let’s see how it produces two entirely different outputs for a difference as low as 0.005-
>>> with demo as source: r.adjust_for_ambient_noise(source,duration=0.51) audio=r.record(source,offset=2.5,duration=3) >>> r.recognize_google(audio)
‘National thanks’
>>> with demo as source: r.adjust_for_ambient_noise(source,duration=0.515) audio=r.record(source,offset=2.5,duration=3) >>> r.recognize_google(audio)
‘clear the sheet’
As you can see, adjust_for_ambient_noise() is definitely not a miracle worker. To get around this, you can use an audio-editing software like Audacity to preprocess the audio.
Working With Microphones
To be able to work with your own voice with speech recognition, you need the PyAudio package. You can install it with pip-
pip install PyAudio
Or you can download and install the binaries with pip. Download link-
https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio
Then:
pip install [file_name_for_binary]
For example:
pip install PyAudio-0.2.11-cp37-cp37m-win32.whl
a. The Microphone class
Like Recognizer for audio files, we will need Microphone for real-time speech data. Since we installed new packages, let’s exit our interpreter and open another session.
>>> import speech_recognition as sr >>> r=sr.Recognizer()
Now, let’s create an instance of Microphone.
>>> mic=sr.Microphone()
Microphone has a static method to list out all microphones available-
>>> sr.Microphone.list_microphone_names()
[‘Microsoft Sound Mapper – Input’, ‘Microphone (Realtek High Defini’, ‘Microsoft Sound Mapper – Output’, ‘Speakers (Realtek High Definiti’, ‘Primary Sound Capture Driver’, ‘Microphone (Realtek High Definition Audio)’, ‘Primary Sound Driver’, ‘Speakers (Realtek High Definition Audio)’, ‘Speakers (Realtek High Definition Audio)’, ‘Microphone (Realtek High Definition Audio)’, ‘Speakers (Realtek HD Audio output)’, ‘Line In (Realtek HD Audio Line input)’, ‘Microphone (Realtek HD Audio Mic input)’, ‘Stereo Mix (Realtek HD Audio Stereo input)’]
Now it is possible to select a certain microphone by its device index with likes of the following piece of code-
>>> mic=sr.Microphone(device_index=3)
But let’s stick with the default for now.
b. Capturing Microphone Input
With the context manager, we capture input using the listen() method.
>>> with mic as source: audio=r.listen(source)
You shall now speak into your microphone. When it detects silence, it stops listening. It then displays the interpreter prompt (>>>).
>>> r.recognize_google(audio)
decease a test
You can call the adjust_for_ambient_noise() method with Microphone too.
>>> with mic as source: r.adjust_for_ambient_noise(source) audio=r.listen(source) >>> r.recognize_google(audio)
this is a test
c. Unintelligible Speech
When Python cannot match some audio to text, it raises an UnknownValueError exception.
>>> r.recognize_google(audio)
Traceback (most recent call last):
 File “<pyshell#7>”, line 1, in <module>
   r.recognize_google(audio)
 File “C:\Users\Ram\AppData\Local\Programs\Python\Python37-32\lib\site-packages\speech_recognition\__init__.py”, line 858, in recognize_google
   if not isinstance(actual_result, dict) or len(actual_result.get(“alternative”, [])) == 0: raise UnknownValueError()
speech_recognition.UnknownValueError
Some pieces of audio that would lead to this will be- coughing sounds, gagging sounds, hand claps, and tongue clicks.
So, this was all in Python Speech Recognition. Hope you like our explanation.
Conclusion
Did you see how easy it was to recognize speech with Python? The APIs made it possible. Well, why we stuffed this into the AI tutorial doesn’t need explanation. Python Speech recognition forms an integral part of Artificial Intelligence.
What would Siri or Alexa be without it?. So, in conclusion to this Python Speech Recognition, we discussed Speech Recognition API to read an Audio file in Python.
Moreover, we saw reading a segment and dealing with noise in Speech Recognition Python tutorial. You can freely tell us the reading experience of this article through comments.Â
Did you like this article? If Yes, please give DataFlair 5 Stars on Google
Thanks, My first successful project.
Hey Ustadhi Mustafa,
We are happy to help you. You can also refer our sidebar for more such interesting articles.
Its is giving only half text of my audio
i went clear explin
need help to avoid choppy audio receive from sr.listen(source).
After saving the audio using get_wav_data, and play the same with VLC, audio just like delayed and choppy.
I think somehow the sample rate mismatch happening. if use sound device to receive the sound its proper sound.
I am using windows 10.
How to fix this issue?
can I use sound_device instance and pass the data to speech_recognition?
why nothing happened when I run the script read the audio? no error also