Audio Transcription with Python

Introduction

Audio transcription is the processing of converting speech in an audio or video file into text. Having a transcription for a video or an audio recording has benefits. Below are some of the benefits of audio transcription:

Expanding target audience. When a transcript get translated to several languages it will open up the content to a wider audience.
Making the content more accessible. With a transcript, the content of an audio can be readily and accurately accessed, more so in cases where the audio quality has been compromised due to background distractions, low volume, regional accents and so on.
Boosting the SEO. With transcription, the keywords used in the audio will now be in written form hence they can be recognized by search engines.

In this article we are going to learn how to transcribe audio using python.

Prerequisite

Basic knowledge of python programming
Assembly AI account

Getting API token

The first thing we will do is to get an API token from Assembly AI.
Let's go to Assembly AI and create a free account.
Once we have an account, we will sign in and then copy the API Key.
The API Key is located at the right of the home page.

Creating config file for storing the key

Now that we have an API Key, let's create a config file for storing the key.
We will create a python file and name it 'api_key.py' (you can give it any name). Then create a variable and assign the API Key to the variable.

API_KEY = 'API Key from Assembly AI'

After creating the config file, we will now create a main file (main.py) where we will write the codes for transcribing the audio.

NOTE: 'api_key.py' and 'main.py' should be in the same directory.

Importing requests and API Key

The first thing that we will do in the 'main.py' is to import requests and the API Key.

import requests
from api_key import API_KEY

Uploading Audio to Assembly AI

Next, let's create a variable 'filename', then get the path of the audio that we want to transcribe and assign this path to 'filename'.

filename = 'audio path'

Let's now create another variable 'upload_endpoint'.

upload_endpoint = 'https://api.assemblyai.com/v2/upload'

Let's also create a variable 'headers' which will be used for authentication. We will use the API Key for authentication.

headers = {'authorization': API_KEY}

Next, let's create a function for reading the audio file.

def read_file(filename, chunk_size=5242880):
    with open(filename, 'rb') as _file:
        while True:
            data = _file.read(chunk_size)
            if not data:
                break
            yield data

Let's now do a post request to upload the file.

upload_response = requests.post(upload_endpoint,
                        headers=headers,
                        data=read_file(filename))

We can print the response to see what kind of response we get.

print(upload_response.json())

The output is an upload url where the audio file is after being uploaded

Transcribing Audio

Our next step now is to transcribe the uploaded audio.
Let's create a variable 'transcript_endpoint'. We will assign the transcription end point to this variable.

transcript_endpoint = "https://api.assemblyai.com/v2/transcript"

The transcript endpoint is the same as the upload endpoint expect that it ends with 'transcript' while the upload endpoint ends with 'upload'.

Next, let's extract the audio url from the response we got from uploading the audio.

audio_url = upload_response.json()['upload_url']

Let's now create a json file that contains the audio url.

json = { "audio_url": audio_url}

Then submit the audio for transcription

transcript_response = requests.post(transcript_endpoint, json=json, headers=headers)

Let's print the response.

print(transcript_response())

Below is the response but with just a few of the info not all of it.

{'id': 'ongvqhtbo7-ad52-4272-b695-d7c624b7c2b5', 'language_model': 'assemblyai_default', 'acoustic_model': 'assemblyai_default', 'language_code': 'en_us', 'status': 'queued'}

The response we get is not the transcript itself, this is because depending on the length of audio it may take a minute or two to get the transcript ready. Instead, the response we get contains a bunch of information about the transcription.

Our main interest from the response is the 'id', we will use it to ask AssemblyAI whether the transcription job is ready or not.

Polling

In a quite simple definition, polling refers to the continuous checking of a resource to see what state they are in.

We will now write the code for polling AssemblyAI. We will use this code to continuously check the status of the transcription job so as to know whether the transcription is ready or not.

The first thing we will do is to get the 'id' from the response.

job_id = transcript_response.json()['id']

After getting the job id, let's create a polling endpoint and then send a get request.

polling_endpoint = transcript_endpoint + '/' + job_id # creating a polling endpoint
polling_response = requests.get(polling_endpoint, headers=headers) # get request

Let's print the polling response.

print(polling_response.json())

Just like the transcript response this response also gives a bunch of information.
Below is the response but with just a few of the info not all of it.

{'id': 'onse9tjyxv-4164-439c-b19c-ee92ae95a7c1', 'language_model': 'assemblyai_default', 'acoustic_model': 'assemblyai_default', 'language_code': 'en_us', 'status': 'processing'}

For this response, we are interested in the status. If the transcription is not ready, the status will indicate 'processing', otherwise it will indicate 'completed'.
From the response above, we can see that the status indicates 'processing' meaning that the transcription is not yet ready.

Let's now create a while loop that will keep on polling AssemblyAI until the status indicates completed.

while True:
    polling_response = requests.get(polling_endpoint, headers=headers)
    if polling_response.json()['status'] == 'processing':
        print('Still processing')
    elif polling_response.json()['status'] == 'error':
        print('error')
    elif polling_response.json()['status'] == 'completed':
        print('completed')
        break

Here's the output from the while loop. As we can see, with the while loop we keep on polling until the status indicates 'completed'.

Still processing
Still processing
Still processing
Still processing
completed

With the status indicating 'completed' let's now print the polling response.

print(polling_response.json())

Below is the response but with a bit of the information.

{'id': 'onsx3yoyc6-aaa9-472e-b8fa-e3cc10e0432f', 'language_model': 'assemblyai_default', 'acoustic_model': 'assemblyai_default', 'language_code': 'en_us', 'status': 'completed', 'text': 'How is your processing with Python?', 'words': [{'text': 'How', 'start': 730, 'end': 822, 'confidence': 0.38754, 'speaker': None},

Our main focus from the response is 'text', this is because it contains the transcript. From the response we can see that our transcript is 'How is your processing with Python?'

Now that we have the transcript, let's save it.

Saving the transcript

We will now write the transcript into a text file and thereafter print 'File succesfully saved!' to confirm that the file has been saved.
The text file will be saved into the working directory.

response = polling_response.json()
with open ('transcript.txt', 'w') as f:
    f.write(response['text'])
print('File succesfully saved!')

Conclusion

In this article we've learnt how to transcribe an audio file and looked at all the steps to follow when transcribing an audio.
A part from AssemblyAI, there are also other platforms such as Deepgram that can be used for converting speech to text.

Credits

How to Transcribe Audio files with python(AssemblyAI)

Blog