Overview of My Submission

My second submission consists on a tool designed for languages teachers and students. Audios with recorded conversations are a common and good resource in language lessons, so I thought it would be useful not only to have a transcription to read while you listen, but also make them interactive to show phrase by phrase translations and to quickly navigate the audio to listen again certain part of the conversation.

Cover image by Julia Filirovska from Pexels

Submission Category

Accessibility Advocates
Wacky Wildcards

Link to Code on GitHub

MiguelMJ / Scripter

Create interactive scripts from recordings for language learning. Deepgram + DEV hackathon submission.

Scripter

Create interactive scripts from recordings for language learning

Create interactive HTML5 pages with a script from a recorded conversation. Powered by Deepgram. Requires API key from Deepgram. Submission for Deepgram+DEV hackathon, 2022.

Yout might want to read the submission post, where you will also find a demo.

Get the API key

Deepgram (required): Create an account in deepgram.com and get an API key.

Store it in a file named deepgramApiKey in the root folder or pass it directly in the CLI using the --deepgram-api-key argument.

Features

Tune the voice recognition process with the Deepgram query parameters for transcriptions pre-recorded audio with -P|--param KEY=VALUE arguments.
Downloads and results are cached to reduce redundant traffic, but you can force it using the -F|--force flag or directly removing the _cache folder.
All log information outputs through stderr and the search output through stdout (or a file, with the -o|--output-file FILE…

View on GitHub

Additional Resources / Info

This time I reused most of the code from my previous submission. In that one I forgot to tell a bit of background, so as most of the code is the same, I will do it here.

In spite of using Python for the program, I didn't want to use the Deepgram SDK. I know other participants have done the same; in my case, it's just because I'm used to making HTTP requests for lots of things, so I chose not to add more dependencies to the application. I felt that the Deepgram API is accesible enough for me.

I've once more used another API: LibreTranslate. This time, the difference is that LibreTranslate is a open source API, so there are different available mirrors, and you can even set up one yourself, so I allowed the user to specify which one to use with a -H|--host parameter and made a quick guide on it on the README.

I'm not actually a webdev person, so the resultant HTML file might not be very polished, but I thought is more than enough for a prototype. I learned to use the <audio> element of HTML5 and manipulate it via JS, which is nice. Thanks to it and the rich information returned by the Deepgram API, the interactive scripts have the following features:

The audio is embedded in the HTML file, so it can be played directly from there (as long as the path of the source audio doesn't change).
Each sentence of the audio is printed in a different color, according to who is the speaker.
If you hover over a sentence, the translation to your language (if supported) appears.
If you click on a sentence, the audio plays only the sentence you clicked on, making it more easy to replay specific parts of the audio.

Here's a little demonstration on this features.

I recorded a short, sample conversation with my sister, who also provided the voice recorder. Some notes:

The translation on hover is not immediate because it uses the title HTML attribute, that has a little delay.
If you speak Spanish (or have a good ear) you'll notice that for the 4th intervention, it doesn't notice that it's a question, so both the transcription and the translation are wrong. That's what happens with automation!

In any case, I think it's neat and even those little mistakes made by the audio recognition and translation can be manually fixed by the teacher, and most of the work would be automatic.

Possible future improvements

Allow usage of output templates.
Add some tool for easy manual fixes.
Enable more flexibility on the options to use different services, if the user wants to.

I hope you like it! And if you are a language teacher or student, feel free to use it and please, give me some feedback!

Blog

Hackathon submission - Interactive conversation scripts for language learning

MiguelMJ