Sophisticated Speech-to-Text Submission Template, The AssemblyAI challenge.

devmercy

Mercy

Posted on November 22, 2024

Sophisticated Speech-to-Text Submission Template, The AssemblyAI challenge.

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

A Speech-to-Text Transcription Web Application using Flask for the backend and AssemblyAI's API for real-time audio transcription. The frontend, built with HTML, CSS, and jQuery, offers an interactive interface for users to control the transcription process and view transcribed text in real-time.

Demo

Here is the link to my app

Image description

Image description

Image description

Journey

Key Features

Real-Time Transcription:

  • Utilizes AssemblyAI's real-time API to process live audio input from the user's microphone and convert it to text.
  • Supports both partial and final transcripts.

Web Interface:

  • Clean and intuitive design with buttons to start and stop transcription.
  • Displays the transcribed text dynamically in a formatted
     block.

Flask Backend:

  • Handles routes for starting (/start), stopping (/stop), and retrieving the transcript (/transcript).
  • Runs transcription in a separate thread to ensure non-blocking operations.

Polling Mechanism:

  • Implements a JavaScript-based polling system using jQuery to fetch the latest transcribed text every second.

Customizable Word Boost:

  • Boosts recognition accuracy for specific words like "AWS," "Azure," and "Google Cloud."

Responsive Design:

  • Ensures usability across devices with a centralized, easy-to-use layout.

Technology Stack

Backend:

  • Python (Flask): Manages the web server and API interactions.
  • AssemblyAI API: Handles speech-to-text transcription.
import assemblyai as aai
from flask import Flask, render_template, jsonify
import os
from dotenv import load_dotenv
import threading

app = Flask(__name__)
load_dotenv()

aai.settings.api_key = os.getenv('API_KEY')

transcriber = None
transcribed_text = ""

def on_open():
    print("Transcription started!")

def on_data(transcript: aai.RealtimeTranscript):
    global transcribed_text
    if not transcript.text:
        return

    if isinstance(transcript, aai.RealtimeFinalTranscript):
        transcribed_text += transcript.text + "\n"
        print("Transcribed:", transcript.text)  # Verify text here
    else:
        print("Received partial:", transcript.text)


def on_error(error):
    print("Error:", error)

def on_close():
    print("Transcription stopped!")

def start_transcription():
    global transcriber
    microphone_stream = aai.extras.MicrophoneStream(sample_rate=16_000)
    transcriber = aai.RealtimeTranscriber(
        encoding=aai.AudioEncoding.pcm_mulaw,
        sample_rate=16_000,
        word_boost=["aws", "azure", "google cloud"],
        end_utterance_silence_threshold=500,
        on_open=on_open,
        on_data=on_data,
        on_error=on_error,
        on_close=on_close,
    )

    for audio_data in microphone_stream:
        if transcriber is not None:
            transcriber.stream(audio_data)
        else:
            break

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/start')
def start():
    global transcribed_text
    transcribed_text = ""  # Clear previous transcript
    threading.Thread(target=start_transcription).start()
    return jsonify({"message": "Transcription started!"})


@app.route('/stop')
def stop():
    global transcriber
    if transcriber is not None:
        transcriber.close()
        transcriber = None
        print("Transcriber closed")
    return jsonify({"message": "Transcription stopped!"})

@app.route('/transcript')
def transcript():
    global transcribed_text
    return jsonify({"transcript": transcribed_text})


if __name__ == "__main__":
    app.run(debug=True)
Enter fullscreen mode Exit fullscreen mode

Frontend:

  • HTML & CSS: Provides structure and styling for the user interface.
  • jQuery: Handles AJAX requests for starting, stopping, and polling the transcription.
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Speech to Text App</title>
    <script src="https://code.jquery.com/jquery-3.5.1.min.js"></script>
    <style>
        body {
            margin: 0;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh; /* Full viewport height */
            font-family: Arial, sans-serif;
            background-color: #f4f4f4; /* Light background for better readability */
        }

        #container {
            text-align: center;
            background: #ffffff;
            padding: 20px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            border-radius: 8px;
        }

        button {
            margin: 10px;
            padding: 10px 20px;
            font-size: 16px;
            border: none;
            border-radius: 5px;
            background-color: #007bff;
            color: white;
            cursor: pointer;
        }

        button:hover {
            background-color: #0056b3;
        }

        pre {
            padding: 10px;
            background-color: #e9ecef;
            border-radius: 5px;
            overflow: auto;
        }
    </style>
</head>
<body>
    <div id="container">
        <h1>Speech-to-Text Transcription</h1>
        <button id="start">Start Transcription</button>
        <button id="stop">Stop Transcription</button>
        <h2>Transcribed Text:</h2>
        <pre id="transcript"></pre>
    </div>

    <script>
        $(document).ready(function() {
            let pollInterval; // Variable to hold the interval ID

            // Start transcription
            $('#start').click(function() {
                $.get('/start', function(data) {
                    console.log(data.message);

                    // Start polling for transcripts if not already polling
                    if (!pollInterval) {
                        pollInterval = setInterval(function() {
                            $.ajax({
                                type: 'GET',
                                url: '/transcript',
                                dataType: 'json',
                                success: function(data) {
                                    console.log(data);
                                    if (data && data.transcript) {
                                        $('#transcript').text(data.transcript);
                                    } else {
                                        $('#transcript').text('No transcription available yet.');
                                    }
                                },
                                error: function(err) {
                                    console.error('Error fetching transcript:', err);
                                }
                            });
                        }, 1000);
                    }
                });
            });

            // Stop transcription
            $('#stop').click(function() {
                $.get('/stop', function(data) {
                    console.log(data.message);

                    // Stop polling for transcripts
                    if (pollInterval) {
                        clearInterval(pollInterval);
                        pollInterval = null; // Reset the interval variable
                    }
                });
            });
        });
    </script>

</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Audio Input:

  • AssemblyAI's MicrophoneStream: Streams audio data for real-time processing.

I utilized additional prompts to enhance the project. I employed the #FlaskWebFramework for rendering templates and returning JSON responses, and I used the #dotenv library to load environment variables from the env file. On the frontend, I implemented CSS for styling the user interface.

Lastly, I want to thank my team, @devnenyasha, and @lindiwe09, for their UI idea. If not for them my UI would have been a mess.

💖 💪 🙅 🚩
devmercy
Mercy

Posted on November 22, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

AILingo
devchallenge AILingo

November 25, 2024

subtitleGenAI subtitle generation platform
devchallenge subtitleGenAI subtitle generation platform

November 25, 2024

Podcast Content Generator
devchallenge Podcast Content Generator

November 25, 2024

MovieLens - Smart Movie Analysis Redefined
devchallenge MovieLens - Smart Movie Analysis Redefined

November 25, 2024

Sync: A real-time VIdeo Chat Application
devchallenge Sync: A real-time VIdeo Chat Application

November 25, 2024