Python Speech Recognition – Artificial Intelligence

Free Machine Learning courses with 130+ real-time projects Start Now!!

Python course with 57 real-time projects - Learn Python

Welcome to our Python Speech Recognition Tutorial. In this tutorial of AI with Python Speech Recognition, we will learn to read an audio file with Python. We will make use of the speech recognition API to perform this task. Moreover, we will discuss reading a segment and dealing with noise.

So, let’s start the Python Speech recognition Tutorial.

Python Speech Recognition - Artificial Intelligence

Python Speech Recognition – Artificial Intelligence

What is Python Speech Recognition?

From systems facilitating single speakers and limited vocabularies of around a dozen words, to systems that recognize from multiple speakers and possess huge vocabularies in various languages, we have come a long way.

What we do here is- we convert speech from physical sound to electrical signals using a microphone. Then, we use an analogue-to-digital converter to convert this to digital data.

Finally, we use multiple models to transcribe audio to text. In the Hidden Markov Model (HMM), we divide the speech signal into 10-millisecond fragments.

a. Available APIs in Python Speech Recognition

With Python, we have several APIs available:

  • apiai
  • assemblyai
  • google-cloud-speech
  • pocketsphinx
  • SpeechRecognition
  • watson-developer-cloud
  • wit

Some Python packages like wit and apiai offer more than just basic speech recognition. Here, though, we will demonstrate SpeechRecognition, which is easier to use. This hard-codes a default API key for the Google Web Speech API.

b. Supported File Types in Python Speech Recognition

  • WAV- PCM/LPCM format
  • AIFF
  • AIFF-C
  • FLAC

c. Prerequisites for Python Speech Recognition

You can use pip to install this-

Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!

pip install SpeechRecognition

To test the installation, you can import this in the interpreter and check the version-

>>> import speech_recognition as sr
>>> sr.__version__


We also download a sample audio from here-

Reading an Audio File in Python

a. The Recognizer class

First, we make an instance of the Recognizer class.

>>> r=sr.Recognizer()

With Recognizer, we have a method for each API-

  • recognize_bing()- Microsoft Bing Speech
  • recognize_google()- Google Web Speech API
  • recognize_google_cloud()- Google Cloud Speech
  • recognize_houndify()- Houndify
  • recognize_ibm()- IBM Speech to Text
  • recognize_sphinx- CMU Sphinx
  • recognize_wit()-

Exempting recognize_sphinx(), you need an Internet connection for anything else you’re working with.

b. Capturing data with record()

We can have the context manager open the file and read its contents, then record it into an AudioData instance.

>>> demo=sr.AudioFile('demo.wav')
>>> with demo as source:

To confirm this, try:

>>> type(audio)

<class ‘speech_recognition.AudioData’>

c. Recognizing Speech in the Audio

Finally, you can call recognize_google() to perform the transcription.

>>> r.recognize_google(audio)

“The Purge can use within The Smurfs the sheet without playback Mount delivery date habitat of a Vow these days it’s okay microwave devices are installed in Windows to use of lemons next find the password on the site that the houses such hard core in a garbage for the study core exercises talking is hard disk”

Well, you can read audio of a different language using the language parameter-

r.recognize_google(audio,language='ro-RO') #for Romanian

Reading a Segment of Audio

When you only want to read a part of your audio file, you can use the arguments offset– telling it where to begin (in seconds), and duration– telling it how long to listen.

>>> with demo as source:
>>> r.recognize_google(audio)

‘clear the sheet without me back’

Note that this caused issues at the extremes. It heard ‘murfs’, which it translated to ‘clear’. It also heard ‘me back’ instead of ‘playback’ because of the noise in the audio.

If we set the offset to 3.3,

>>> with demo as source:
>>> r.recognize_google(audio)

‘clear the sheet with Ok’

But check what happens when we set the offset to 2.5-

>>> with demo as source:
>>> r.recognize_google(audio)

‘National thanks’

Python Speech Recognition – Dealing with Noise

Okay, let’s face it. There will always be noise, no matter how professional appliances you use to record your audio. So let’s better learn to deal with it.

The method adjust_for_ambient_noise() reads the first second of a file stream to calibrate the recognizer to the audio’s noise level. This often consumes that part of the audio, and it doesn’t make it to the transcription.

>>> with demo as source:
>>> r.recognize_google(audio)

‘clear the sheet’
We can provide this an argument for how long it should listen for noise so it can calibrate the recognizer. Let’s see how it produces two entirely different outputs for a difference as low as 0.005-

>>> with demo as source:
>>> r.recognize_google(audio)

‘National thanks’

>>> with demo as source:
>>> r.recognize_google(audio)

‘clear the sheet’

As you can see, adjust_for_ambient_noise() is definitely not a miracle worker. To get around this, you can use an audio-editing software like Audacity to preprocess the audio.

Working With Microphones

To be able to work with your own voice with speech recognition, you need the PyAudio package. You can install it with pip-

pip install PyAudio

Or you can download and install the binaries with pip. Download link-


pip install [file_name_for_binary]

For example:

pip install PyAudio-0.2.11-cp37-cp37m-win32.whl

a. The Microphone class

Like Recognizer for audio files, we will need Microphone for real-time speech data. Since we installed new packages, let’s exit our interpreter and open another session.

>>> import speech_recognition as sr
>>> r=sr.Recognizer()

Now, let’s create an instance of Microphone.

>>> mic=sr.Microphone()

Microphone has a static method to list out all microphones available-

>>> sr.Microphone.list_microphone_names()

[‘Microsoft Sound Mapper – Input’, ‘Microphone (Realtek High Defini’, ‘Microsoft Sound Mapper – Output’, ‘Speakers (Realtek High Definiti’, ‘Primary Sound Capture Driver’, ‘Microphone (Realtek High Definition Audio)’, ‘Primary Sound Driver’, ‘Speakers (Realtek High Definition Audio)’, ‘Speakers (Realtek High Definition Audio)’, ‘Microphone (Realtek High Definition Audio)’, ‘Speakers (Realtek HD Audio output)’, ‘Line In (Realtek HD Audio Line input)’, ‘Microphone (Realtek HD Audio Mic input)’, ‘Stereo Mix (Realtek HD Audio Stereo input)’]

Now it is possible to select a certain microphone by its device index with likes of the following piece of code-

>>> mic=sr.Microphone(device_index=3)

But let’s stick with the default for now.

b. Capturing Microphone Input

With the context manager, we capture input using the listen() method.

>>> with mic as source:

You shall now speak into your microphone. When it detects silence, it stops listening. It then displays the interpreter prompt (>>>).

>>> r.recognize_google(audio)

decease a test
You can call the adjust_for_ambient_noise() method with Microphone too.

>>> with mic as source:
>>> r.recognize_google(audio)

this is a test

c. Unintelligible Speech

When Python cannot match some audio to text, it raises an UnknownValueError exception.

>>> r.recognize_google(audio)

Traceback (most recent call last):
 File “<pyshell#7>”, line 1, in <module>
 File “C:\Users\Ram\AppData\Local\Programs\Python\Python37-32\lib\site-packages\speech_recognition\”, line 858, in recognize_google
   if not isinstance(actual_result, dict) or len(actual_result.get(“alternative”, [])) == 0: raise UnknownValueError()

Some pieces of audio that would lead to this will be- coughing sounds, gagging sounds, hand claps, and tongue clicks.

So, this was all in Python Speech Recognition. Hope you like our explanation.


Did you see how easy it was to recognize speech with Python? The APIs made it possible. Well, why we stuffed this into the AI tutorial doesn’t need explanation. Python Speech recognition forms an integral part of Artificial Intelligence.

What would Siri or Alexa be without it?. So, in conclusion to this Python Speech Recognition, we discussed Speech Recognition API to read an Audio file in Python.

Moreover, we saw reading a segment and dealing with noise in Speech Recognition Python tutorial. You can freely tell us the reading experience of this article through comments. 

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

follow dataflair on YouTube

6 Responses

  1. Ustadhi Mustafa says:

    Thanks, My first successful project.

    • DataFlair Team says:

      Hey Ustadhi Mustafa,

      We are happy to help you. You can also refer our sidebar for more such interesting articles.

  2. Sijin John says:

    Its is giving only half text of my audio

  3. ramesh says:

    i went clear explin

  4. kali says:

    need help to avoid choppy audio receive from sr.listen(source).
    After saving the audio using get_wav_data, and play the same with VLC, audio just like delayed and choppy.
    I think somehow the sample rate mismatch happening. if use sound device to receive the sound its proper sound.
    I am using windows 10.
    How to fix this issue?
    can I use sound_device instance and pass the data to speech_recognition?

  5. Ahmad Hamizan bin Hamzah says:

    why nothing happened when I run the script read the audio? no error also

Leave a Reply

Your email address will not be published. Required fields are marked *