Speech Recognition from MP3 to Text
Tahsin Abrar
Posted on October 9, 2024
This project demonstrates how to convert an MP3 audio file to a WAV format and then use Google Speech Recognition to transcribe the audio into Bengali text. The code utilizes the SpeechRecognition
and pydub
libraries to handle audio processing.
Table of Contents
Installation
Before running the code, ensure you have the following packages installed. You can do this using pip in a Google Colab environment:
!pip install SpeechRecognition pydub
!apt-get install ffmpeg
Usage
- Upload Your MP3 File: When prompted by the script, upload an MP3 audio file.
- Conversion: The script will automatically convert the MP3 file to WAV format.
- Transcription: It will then use Google Speech Recognition to transcribe the audio into Bengali text.
- Output: The transcribed text will be printed in the output section of the notebook.
Example
Simply run the script, upload an MP3 file, and view the printed output text in Bengali.
Code Explanation
Here's a breakdown of the code:
-
Library Imports:
- The necessary libraries are imported:
files
for file upload,AudioSegment
frompydub
for audio processing, andspeech_recognition
for speech-to-text conversion.
- The necessary libraries are imported:
-
File Upload:
- The script allows users to upload an MP3 file using the
files.upload()
method.
- The script allows users to upload an MP3 file using the
-
Audio Conversion:
- The uploaded MP3 file is converted to WAV format using
AudioSegment.from_mp3()
and then exported.
- The uploaded MP3 file is converted to WAV format using
-
Speech Recognition:
- The
speech_recognition
library is utilized to recognize speech from the audio file. - The audio data is read, and the
recognize_google()
method is called to transcribe the audio into Bengali text. - The recognized text is printed, or an error message is shown if the audio is not understood.
- The
Reference Code
Here is the full reference code for the project:
# Import necessary libraries
from google.colab import files
import os
from pydub import AudioSegment
import speech_recognition as sr
# Upload an MP3 file
uploaded = files.upload()
# Convert MP3 to WAV
mp3_file = next(iter(uploaded))
wav_file = "converted.wav"
# Load the MP3 file
audio = AudioSegment.from_mp3(mp3_file)
# Export as WAV
audio.export(wav_file, format="wav")
# Initialize the recognizer
recognizer = sr.Recognizer()
# Perform speech recognition on the WAV file
with sr.AudioFile(wav_file) as source:
audio_data = recognizer.record(source)
try:
# Recognize speech using Google Speech Recognition
text = recognizer.recognize_google(audio_data, language='bn-BD')
print("Bengali Text:", text)
except sr.UnknownValueError:
print("Sorry, I could not understand the audio.")
except sr.RequestError as e:
print(f"Could not request results from Google Speech Recognition service; {e}")
Requirements
- Python 3.x
- Google Colab (for easy execution of the code)
- Audio file in MP3 format
💖 💪 🙅 🚩
Tahsin Abrar
Posted on October 9, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.