Streaming voice to SQL with AssemblyAI: Execute the generated SQL, use Ollama, RAG templates and vector embeddings

ogbotemi_ogungbamila_3ad3

Ogbotemi Ogungbamila

Posted on November 25, 2024

Streaming voice to SQL with AssemblyAI: Execute the generated SQL, use Ollama, RAG templates and vector embeddings

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I built an online Voice to SQL environment for convert the recorded speech of users into SQL statements with the following features:

Voice to SQL

  • Convert user speech to text, preferably SQL.
  • Optional feature of streaming the currently recorded voice to the server to display the equivalent SQL statement

  • Applies intelligence by replacing words in the converted SQL statements with the glyph they are defined as i.e 'less than' gets replaced with '<', in a customizable and extensible widget.

  • Audio visualizer during record with options to pause and play

  • Users can specify the bitrate for geeks for optimum results

    SQL statements execution

  • Provides an interface for switching between MySQL and PostgreSQL databases on the fly

  • Displays details of errors for every database interaction gone wrong

Generation of Vector embeddings, using a RAG widget and PostgreSQL databases: Timescale, Neon.tech

  • Provides a widget for obtaining embeddings for custom prompts or text, messages from Ollama models running locally
  • Provides SQL templates: SELECT and INSERT for applying generated embeddings along with their metadata on PostgreSQL databases that support them

Downloads

  • {query, result} object from executed queries
  • Recorded audio.
  • Option to upload {query, result} object to Pinata

Demo

Node.js server on Vercel

https://voice-sql-ai.vercel.app/

Python server for POST requests

https://voice-ai-sql-python.vercel.app/

https://voice-ai-sql-python.vercel.app/upload with {recording: <base64data>} in the POST request body

Psst: GET requests to the Python server still serves the page I copied from https://developer.mozilla.org/en-US/docs/Web/API/MediaStream_Recording_API/Using_the_MediaStream_Recording_API. It was a great, simple demo which I used to learn how to handle base64 encoded and binary data in Python as well as to POST it to AssemblyAI's API.

Screenshots

Enabled dark mode via browser devtools

Expanded view of widget for Voice-to-SQL

Image description

View of the other widgets for creating and using vector embeddings

Image description

Journey

Falling back to Python

Curiously enough, python code examples AssemblyAI's docs worked while the JavaScript ones in Node.js either crashed with "Not allowed" errors or returned {error: null} as a response via Node.js SDK and API respectively

AssemblyAI's Speech-to-Text API

The API was very straight forward and more flexible than the Python SDK for my use case with the following workflow

  • Upload binary data from decoded base64 string to AssemblyAI to obtain a URL
  • Use the received URL along with my API key to request for audio transcription to text and receive the sent JSON.

Usage

I used AssemblyAI's Speech-to-Text to convert recorded speech of users to SQL statements which are then refined further as follows:

  • Words in the received text are replaced with the glyphs they represent in SQL.

This submission doesn't quite qualify for the additional prompts since I didn't use them but I did something similar to the other two in the webapp I created.

Issues that thwarted the work

Credits issue with real-time and LeMuR

I was not allowed to use the other tools - LeMUR and real-time streaming with the free credits: I was advised to buy credits despite having over $40 worth of free credits, hence why I sort of implemented something similar to them along with speech-to-text on this web app: https://voice-sql-ai.vercel.app/

Real-time streaming

I was going to implement voice to SQL as a stream but the said credits issue got in the way and I got creative by implementing it instead in Speech-to-Text via code.

Final Thoughts

This was a fun project that broadened my knowledge on using python as a server along with Node.js. It also made me add more functionalities to the SQL playground I had built.
Finally, it made me explore how to get creative with handling and sending binary media data in browsers.

Thank you for reading!

💖 💪 🙅 🚩
ogbotemi_ogungbamila_3ad3
Ogbotemi Ogungbamila

Posted on November 25, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

AILingo
devchallenge AILingo

November 25, 2024

subtitleGenAI subtitle generation platform
devchallenge subtitleGenAI subtitle generation platform

November 25, 2024

Podcast Content Generator
devchallenge Podcast Content Generator

November 25, 2024

MovieLens - Smart Movie Analysis Redefined
devchallenge MovieLens - Smart Movie Analysis Redefined

November 25, 2024

Sync: A real-time VIdeo Chat Application
devchallenge Sync: A real-time VIdeo Chat Application

November 25, 2024