Voice to Text Chatbot.

This blog is the second part of a two-part chatbot tutorial series. Check out the first part here.

In this blog, I'm going to walk you through how to implement a voice-to-text and vice versa feature for our chatbot :). The API I'm using to implement this chatbot skill is Web Speech API created by Mozilla Firefox, besides this, there's also Google Cloud Speech-to-Text API but I'm not going to dive too deep into that today! Okay, let's dive into it.

Let's add a microphone icon (you can choose any icon you want) in our chatbot input to notify the user about our newly added feature:

<InputGroup.Append>                          
    <img 
        src='https://img.icons8.com/dusk/64/000000/microphone.png'
        alt='microphone-icon'
        variant='info' 
        type="submit" 
        className="mb-2 voice-chat-btn" 
        onClick={() => handleVoice(recognition)}
    />
</InputGroup.Append>

This is our current ChatBot:

This button will listen to a click event, and you probably spot, there is a function handleVoice() that got executed whenever the user clicks on the microphone. The idea is, when the user clicks on that button, our bot will automatically knows to listen for the human voice and translate it from voice to text. First, let's initialize our speech recognition using Web Speech API:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = new SpeechRecognition();
recognition.lang = 'en-US';

On the official doc, SpeechRecognition definition:

"The SpeechRecognition interface of the Web Speech API is the controller interface for the recognition service; this also handles the SpeechRecognitionEvent sent from the recognition service."

This is the core of our speech-to-text translation. Beside that, there are multiple methods (start(), stop(), abort()) and properties (lang, grammars, continuous, etc) that we can add. For this chatbot, I'm only using start(), onresult() methods, and lang property to set English as the language for my current chatbot. Let's implement our handleVoice() function that will translate our voice-to-text:

const handleVoice = (recognition) => {
    recognition.start()

    recognition.onresult = function (event) {
        const resultIndx = event.resultIndex
        const transcript = event.results[resultIndx][0].transcript
        setUserHistory([transcript, ...userHistory])
        matchReply(transcript)
    }
}

In this function, we will execute:

recognition.start(): starts the speech recognition to listen for audio.
recognition.onresult(): an event handler that sends the translated words or phrase back to our application.
setUserHistory(): save transcript to our state management.
matchReply(): generate a corresponding bot reply for our transcript.

Now, our bot should be able to recognize and understand our speech. But it's not talking back to us yet! Let's add this functionality so that our bot can have a full conversation with us:

const speak = (string) => {
    const u = new SpeechSynthesisUtterance();
    const allVoices = speechSynthesis.getVoices();
    u.voice = allVoices.filter(voice => voice.name === "Alex")[0];
    u.text = string;
    u.lang = "en-US";
    u.volume = 1;
    u.rate = 1;
    u.pitch = 1;
    speechSynthesis.speak(u);
}

And in our matchReply(), let's execute our newly added speak() function:

const matchReply = (userInput) => {
    ...

    setBotHistory([botMsg, ...botHistory])
    speak(botMsg)
}

Blog

Voice to Text Chatbot.

Kim Nguyen

Join Our Newsletter. No Spam, Only the good stuff.

Related