RepodAI: An AI-Powered Podcasting Platform with Transcription, Summarization, and Interactive Features 🎙️

chijioke_osadebe_c6d2e7f7

Chijioke Osadebe

Posted on November 25, 2024

RepodAI: An AI-Powered Podcasting Platform with Transcription, Summarization, and Interactive Features 🎙️

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text, No More Monkey Business

What I Built

I built RepodAI, an AI-powered podcasting platform designed to harness the capabilities of AssemblyAI’s Universal-2 Speech-to-Text Model. RepodAI is more than just a transcription tool—it integrates conversational intelligence, natural language processing, and sentiment analysis to enhance the podcast creation and consumption experience. From transcription to sentiment analysis, speaker identification, and translation, RepodAI empowers podcasters and listeners alike with rich features and seamless usability.

Demo

Screenshots:

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Live Demo:

RepodAI

GitHub Repository:

GitHub logo CijeTheCreator / Repod

A powerpacked podcasting platform built around AssemblyAI.

This is a Next.js project bootstrapped with create-next-app.

Getting Started

First, run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:3000 with your browser to see the result.

You can start editing the page by modifying app/page.tsx. The page auto-updates as you edit the file.

This project uses next/font to automatically optimize and load Geist, a new font family for Vercel.

Learn More

To learn more about Next.js, take a look at the following resources:

You can check out the Next.js GitHub repository - your feedback and contributions are welcome!

Deploy on Vercel

The easiest way to deploy your Next.js app is to use the Vercel Platform from the creators of Next.js.

Check out our Next.js deployment documentation for more…

Journey

RepodAI began as a vision for a sophisticated podcasting platform that brings conversational intelligence to the forefront. Leveraging AssemblyAI’s Universal-2 model as the foundation, RepodAI transforms how users interact with audio content. Here’s how I incorporated AssemblyAI’s Speech-to-Text capabilities into this project:

Key Features

  1. Audio Upload and Transcription
  2. Profanity Filtering
  3. Speaker Identification and Sentiment Analysis
  4. Chapter Segmentation and Summarization
  5. Advanced Search and Navigation
  6. AI-Powered Interaction superchared by Lemur
  7. Multi-Language Translation
  8. Dynamic and Interactive Player
  9. Customizable Themes and Mobile Responsiveness

The Prompts I Worked On

Sophisticated Speech-to-Text

I utilized AssemblyAI’s transcription API for two main use cases:

Transcribing the Main Podcast

This step involved converting the uploaded audio file into text, ensuring that the podcast's spoken content was accurately captured and ready for processing by features such as summarization and sentiment analysis.

async function getTranscript(
  audioUrl: string,
  podcastId: number,
  { basic_details, redaction, speakers }: TOverallForm,
): Promise<any> {
  const redactionKeys = Object.keys(redaction);
  const redactionList = redactionKeys.filter((value, index) => {
    return redaction[value as keyof typeof redaction];
  }) as PiiPolicy[];
  let transcript = await client.transcripts.transcribe({
    audio: audioUrl,
    speaker_labels: true,
    auto_chapters: true,
    redact_pii_audio: true,
    filter_profanity: basic_details.filter_profanity,
    redact_pii_policies: redactionList,
    sentiment_analysis: true,
    format_text: true,
    speakers_expected: speakers.speakers.split(",").length,
  });
  await updatePodcastTranscriptionId(podcastId, transcript.id);
  const sentencesResponse = await client.transcripts.sentences(transcript.id);
  const sentences = sentencesResponse.sentences;
  return sentences;
}

Enter fullscreen mode Exit fullscreen mode

Converting Asked Questions to Text

Questions asked to RepodAI's chatbot (via voice input) are transcribed into text before being processed by LeMUR, enabling precise and context-aware responses.

*No More Monkey Business *

I also employed LeMUR for the following key features:

  1. RepodAI’s Chatbot

    The chatbot generates insightful answers to user questions about the podcast by processing transcriptions of both the podcast and the user’s query.

  2. Creating the Initial Podcast Summary

    During the upload process, RepodAI uses the transcribed content to generate an initial summary of the podcast, providing a quick overview for users.

Tech Stack 🚀

  • Next.js 🖥️: For building the UI and backend.
  • ShadcnUI 🎨: Component library for consistent and elegant UI.
  • Neon Postgres 🐘: To store user-generated podcasts.
  • Three.js 🎧: For audio visualization when asking AI questions.
  • Universal-2 🗣️: Powering sophisticated speech-to-text transcription.
  • LeMUR 🤖: Intelligent LLM-powered interaction with spoken data.
  • OpenAI TTS 🗨️: For text-to-speech conversion.

References

The algorithm for RepodAI's audio visulaization(when recording) is from Prakhar625 's audio visualiser codepen in which I altered the source code a little to suit my style of design and way of function for this project.

💖 💪 🙅 🚩
chijioke_osadebe_c6d2e7f7
Chijioke Osadebe

Posted on November 25, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

AILingo
devchallenge AILingo

November 25, 2024

subtitleGenAI subtitle generation platform
devchallenge subtitleGenAI subtitle generation platform

November 25, 2024

Podcast Content Generator
devchallenge Podcast Content Generator

November 25, 2024

MovieLens - Smart Movie Analysis Redefined
devchallenge MovieLens - Smart Movie Analysis Redefined

November 25, 2024

Sync: A real-time VIdeo Chat Application
devchallenge Sync: A real-time VIdeo Chat Application

November 25, 2024