Convert Speech to Text in Python

Speech recognition systems use various algorithms and techniques to interpret the acoustic patterns of speech and map them to corresponding text representations. This technology which converts spoken language into written text. It is the process of transcribing spoken words into a textual format that can be used for various purposes such as text analysis, natural language processing, or even for enabling voice-driven applications.

Table of Contents

Install Following Packages for Speech to Text

Python Speech Recognition module:

pip install speechrecognition

pip install pyaudio

The library pyaudio to speak in microphone and pyttsx3 allows you to convert text into spoken words and has several features, including support for multiple TTS engines, changing voice properties, and adjusting speech rate.

pip install pyttsx3

Speech Input Using a Microphone and Translation of Speech to Text.

Run Following Code

import speech_recognition as sr

def speech_to_text():
    # Create a speech recognition object
    recognizer = sr.Recognizer()

    # Use the default microphone as the audio source
    with sr.Microphone() as source:
        print("Say something....")
        # Adjust the background noise
        recognizer.adjust_for_ambient_noise(source)
        # Listen for audio input
        audio = recognizer.listen(source)

        try:
            # Use Google's Web Speech API to recognize the speech
            print('You said : \n' + recognizer.recognize_google(audio))

        except Exception as e:
            print("Error : "+ str(e))

if __name__ == "__main__":
    speech_to_text()

That’s it, now you will be able to speck and see the text as output.

Save the Text Audio

If you want to save whatever audio you have generated in a file then follow the below code.

import speech_recognition as sr

def speech_to_text():
    # Create a speech recognition object
    recognizer = sr.Recognizer()

    # Use the default microphone as the audio source
    with sr.Microphone() as source:
        print("Say something...")
        # Adjust for background noise
        recognizer.adjust_for_ambient_noise(source)
        # Listen for audio input
        audio = recognizer.listen(source)
        
        print('Recognizing now.....')
        
        try:
            print('You said : \n' + recognizer.recognize_google(audio))
            print('Audio recorded successfully.')
            
        except Exception as e:
            print("Error : "+ str(e))

        # here we are saving audio file   
        with open('recordedaudio.wav', 'wb') as f:
            f.write(audio.get_wav_data())

if __name__ == "__main__":
    speech_to_text()

Output:

Say something...
You said : 
hello welcome to DTI Labs

Conclusion

Speech recognition technology has become an integral part of many modern applications, making human-computer interaction more natural and convenient. I hope you must enjoyed this article, let me know in comment box below. Thanks.