Sophisticated Speech-to-Text Application

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I developed a Speech-to-Text application in Taipy using AssemblyAI's Universal-2 Speech-to-Text model. The application's features are:

Transcribe spoken words into written text.
Detect multiple speakers in an audio file and what each speaker said.
Summarize your audio data with key takeaways
Download transcriptions to a text file.

Demo

Link to Github repository

Screenshots

Journey

The application was developed using Taipy, a Python-based framework, which made integrating with AssemblyAI’s Speech-to-Text Model seamless, as both use Python. AssemblyAI's comprehensive documentation simplified the implementation of the transcription and diarization features. Taipy was utilized for the user interface, while AssemblyAI handled all the Speech-to-Text processing.
This submission also meets the criteria for the No More Monkey Business Challenge Prompt. The summarization feature was implemented using LeMUR, where a custom prompt was sent to the LLM to generate a concise summary of the transcript.
This is a solo submission, with all the work on Taipy and AssemblyAI completed by myself. It was an enjoyable learning experience. AssemblyAI has made building Speech-to-Text applications incredibly easy, and I will certainly use it again in the future.

Blog

Sophisticated Speech-to-Text Application

David Akim

What I Built

Demo

Screenshots

Journey

Join Our Newsletter. No Spam, Only the good stuff.

Related