Building a Voice Assistant using Web Speech API

roopalisingh

Roopali Singh

Posted on November 5, 2021

Building a Voice Assistant using Web Speech API

Hi thereπŸ‘‹,

In this guide we will be learning how to integrate voice user interface in our web application.

We are working with React. To incorporate Voice User Interface (VUI) we will use Web Speech API.

For simplicity we will not be focusing on design.

Our aim is to build a voice assistant which will recognize what we say and answer accordingly.

Loudspeaker image

For this we are using Web Speech API.

This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later.

The Web Speech API provides us with two functionality β€”

  • Speech Recognition which converts speech to text.
  • Speech Synthesis which converts text to speech.

1. We will start by installing two npm packages:

// for speech recognition
npm i react-speech-recognition
// for speech synthesis
npm i react-speech-kit
Enter fullscreen mode Exit fullscreen mode

Now before moving on to the next step, let's take a look at some important functions of Speech Recognition.

Detecting browser support for Web Speech API

if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
    //Render some fallback function content
}
Enter fullscreen mode Exit fullscreen mode

Turning the microphone on

SpeechRecognition.startListening();
Enter fullscreen mode Exit fullscreen mode

Turning the microphone off

// It will first finish processing any speech in progress and
// then stop.
SpeechRecognition.stopListening();
// It will cancel the processing of any speech in progress.
SpeechRecognition.abortListening();
Enter fullscreen mode Exit fullscreen mode

Consuming the microphone transcript

// To make the microphone transcript available in our component.
const { transcript } = useSpeechRecognition();
Enter fullscreen mode Exit fullscreen mode

Resetting the microphone transcript

const { resetTranscript } = useSpeechRecognition();
Enter fullscreen mode Exit fullscreen mode

Now we're ready to add Speech Recognition (text to speech) in our web app πŸš€

2. In the App.js file, we will check the support for react-speech-recognition and add two components StartButton and Output.

The App.js file should look like this for now:

import React from "react";
import StartButton from "./StartButton";
import Output from "./Output";
import SpeechRecognition from "react-speech-recognition";

function App() {

// Checking the support
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
  return (
    <div>
      Browser does not support Web Speech API (Speech Recognition).
      Please download latest Chrome.
    </div>
  );
}

  return (
    <div className="App">
      <StartButton />
      <Output />
    </div>
  );
}

export default App;
Enter fullscreen mode Exit fullscreen mode

3. Next we will move to the StartButton.js file.

Here we will add a toggle button to start and stop listening.

import React, { useState } from "react";

function StartButton() {
  const [listen, setListen] = useState(false);

  const clickHandler = () => {
    if (listen === false) {
      SpeechRecognition.startListening({ continuous: true });
      setListen(true);
      // The default value for continuous is false, meaning that
      // when the user stops talking, speech recognition will end. 
    } else {
      SpeechRecognition.abortListening();
      setListen(false);
    }
  };

  return (
    <div>
      <button onClick={clickHandler}>
        <span>{listen ? "Stop Listening" : "Start Listening"} 
        </span>
      </button>
    </div>
  );
}

export default StartButton;
Enter fullscreen mode Exit fullscreen mode

4. Now in the Output.js file, we will use useSpeechRecognition react hook.

useSpeechRecognition gives a component access to a transcript of speech picked up from the user's microphone.

import React, { useState } from "react";
import { useSpeechRecognition } from "react-speech-recognition";

function Output() {
  const [outputMessage, setOutputMessage] = useState("");

  const commands = [
    // here we will write various different commands and
    // callback functions for their responses.
  ];

  const { transcript, resetTranscript } = 
                              useSpeechRecognition({ commands });

  return (
    <div>
      <p>{transcript}</p>
      <p>{outputMessage}</p>
    </div>
  );
}

export default Output;
Enter fullscreen mode Exit fullscreen mode

5. Before defining the commands, we will add Speech Synthesis in our web app to convert the outputMessage to speech.

In the App.js file, we will now check the support for the speech synthesis.

import { useSpeechSynthesis } from "react-speech-kit";

funtion App() {
  const { supported } = useSpeechSynthesis();

  if (supported == false) {
    return <div>
      Browser does not support Web Speech API (Speech Synthesis).
      Please download latest Chrome.
    </div>
}
.
.
.
export default App;
Enter fullscreen mode Exit fullscreen mode

6. Now in the Output.js file, we will use useSpeechSynthesis() react hook.

But before moving on, we first take a look at some important functions of Speech Synthesis:

  • speak(): Call to make the browser read some text.
  • cancel(): Call to make SpeechSynthesis stop reading.

We want to call the speak() function each time the outputMessage is changed.

So we would add the following lines of code in Output.js file:

import React, { useEffect, useState } from "react";
import { useSpeechSynthesis } from "react-speech-kit";

function Output() {
  const [outputMessage, setOutputMessage] = useState("");
  const { speak, cancel } = useSpeechSynthesis();

  // The speak() will get called each time outputMessage is changed 
  useEffect(() => {
      speak({
        text: outputMessage,
      });
  }, [outputMessage]);
.
.
.
export default Output;
}
Enter fullscreen mode Exit fullscreen mode

πŸ˜ƒWhoa!
Everything is now setup πŸ”₯
The only thing left is to define our commands πŸ‘©πŸŽ€

Naruto: Only commands left

7. Now we're back at our Output.js file to complete our commands.

const commands = [
  {
    // In this, the words that match the splat(*) will be passed
    // into the callback,

    command: "I am *",

    callback: (name) => {
      resetTranscript();
      setOutputMessage(`Hi ${name}. Nice name`);
    },
  },

  // DATE AND TIME
  {
    command: "What time is it",

    callback: () => {
      resetTranscript();
      setOutputMessage(new Date().toLocaleTimeString());
    },
    matchInterim: true,
    // The default value for matchInterim is false, meaning that
    // the only results returned by the recognizer are final and
    // will not change.
  },
  {
    // This example would match both:
    // 'What is the date' and 'What is the date today'

    command: 'What is the date (today)',

    callback: () => {
      resetTranscript();
      setOutputMessage(new Date().toLocaleDateString());
    },
  },

  // GOOGLING (search)
  {
    command: "Search * on google",

    callback: (gitem) => {
      resetTranscript();

      // function to google the query(gitem)
      function toGoogle() {
        window.open(`http://google.com/search?q=${gitem}`, "_blank");
      }
      toGoogle();

      setOutputMessage(`Okay. Googling ${gitem}`);
    },
  },

  // CALCULATIONS
  {
    command: "Add * and *",

    callback: (numa, numb) => {
      resetTranscript();
      const num1 = parseInt(numa, 10);
      const num2 = parseInt(numb, 10);
      setOutputMessage(`The answer is: ${num1 + num2}`);
    },
  },

  // CLEAR or STOP.
  {
    command: "clear",

    callback: () => {
      resetTranscript();
      cancel();
    },
    isFuzzyMatch: true,
    fuzzyMatchingThreshold: 0.2,

    // isFuzzyMatch is false by default.
    // It determines whether the comparison between speech and
    // command is based on similarity rather than an exact match.

    // fuzzyMatchingThreshold (default is 0.8) takes values between
    // 0 (will match anything) and 1 (needs an exact match).
    //  If the similarity of speech to command is higher than this
    // value, the callback will be invoked.
  },
]
Enter fullscreen mode Exit fullscreen mode

πŸ˜ƒWe have successfully built a voice assistant using the Web Speech API that do as we say πŸ”₯πŸ”₯

Note: As of May 2021, browsers support for Web Speech API:

  • Chrome (desktop)
  • Chrome (Android)
  • Safari 14.1
  • Microsoft Edge
  • Android webview
  • Samsung Internet

For all other browsers, you can integrate a polyfill.

Here's a demo that I have made with some styling:

I call it Aether

Completed

πŸ’– πŸ’ͺ πŸ™… 🚩
roopalisingh
Roopali Singh

Posted on November 5, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related