Building a Voice Assistant using Web Speech API

Hi there👋,

In this guide we will be learning how to integrate voice user interface in our web application.

We are working with React. To incorporate Voice User Interface (VUI) we will use Web Speech API.

For simplicity we will not be focusing on design.

Our aim is to build a voice assistant which will recognize what we say and answer accordingly.

For this we are using Web Speech API.

This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later.

The Web Speech API provides us with two functionality —

Speech Recognition which converts speech to text.
Speech Synthesis which converts text to speech.

1. We will start by installing two npm packages:

// for speech recognition
npm i react-speech-recognition
// for speech synthesis
npm i react-speech-kit

Now before moving on to the next step, let's take a look at some important functions of Speech Recognition.

Detecting browser support for Web Speech API

if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
    //Render some fallback function content
}

Turning the microphone on

SpeechRecognition.startListening();

Turning the microphone off

// It will first finish processing any speech in progress and
// then stop.
SpeechRecognition.stopListening();
// It will cancel the processing of any speech in progress.
SpeechRecognition.abortListening();

Consuming the microphone transcript

// To make the microphone transcript available in our component.
const { transcript } = useSpeechRecognition();

Resetting the microphone transcript

const { resetTranscript } = useSpeechRecognition();

Now we're ready to add Speech Recognition (text to speech) in our web app 🚀

2. In the App.js file, we will check the support for react-speech-recognition and add two components StartButton and Output.

The App.js file should look like this for now:

import React from "react";
import StartButton from "./StartButton";
import Output from "./Output";
import SpeechRecognition from "react-speech-recognition";

function App() {

// Checking the support
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
  return (
    <div>
      Browser does not support Web Speech API (Speech Recognition).
      Please download latest Chrome.
    </div>
  );
}

  return (
    <div className="App">
      <StartButton />
      <Output />
    </div>
  );
}

export default App;

3. Next we will move to the StartButton.js file.

Here we will add a toggle button to start and stop listening.

import React, { useState } from "react";

function StartButton() {
  const [listen, setListen] = useState(false);

  const clickHandler = () => {
    if (listen === false) {
      SpeechRecognition.startListening({ continuous: true });
      setListen(true);
      // The default value for continuous is false, meaning that
      // when the user stops talking, speech recognition will end. 
    } else {
      SpeechRecognition.abortListening();
      setListen(false);
    }
  };

  return (
    <div>
      <button onClick={clickHandler}>
        <span>{listen ? "Stop Listening" : "Start Listening"} 
        </span>
      </button>
    </div>
  );
}

export default StartButton;

4. Now in the Output.js file, we will use useSpeechRecognition react hook.

useSpeechRecognition gives a component access to a transcript of speech picked up from the user's microphone.

import React, { useState } from "react";
import { useSpeechRecognition } from "react-speech-recognition";

function Output() {
  const [outputMessage, setOutputMessage] = useState("");

  const commands = [
    // here we will write various different commands and
    // callback functions for their responses.
  ];

  const { transcript, resetTranscript } = 
                              useSpeechRecognition({ commands });

  return (
    <div>
      <p>{transcript}</p>
      <p>{outputMessage}</p>
    </div>
  );
}

export default Output;

5. Before defining the commands, we will add Speech Synthesis in our web app to convert the outputMessage to speech.

In the App.js file, we will now check the support for the speech synthesis.

import { useSpeechSynthesis } from "react-speech-kit";

funtion App() {
  const { supported } = useSpeechSynthesis();

  if (supported == false) {
    return <div>
      Browser does not support Web Speech API (Speech Synthesis).
      Please download latest Chrome.
    </div>
}
.
.
.
export default App;

6. Now in the Output.js file, we will use useSpeechSynthesis() react hook.

But before moving on, we first take a look at some important functions of Speech Synthesis:

speak(): Call to make the browser read some text.
cancel(): Call to make SpeechSynthesis stop reading.

We want to call the speak() function each time the outputMessage is changed.

So we would add the following lines of code in Output.js file:

import React, { useEffect, useState } from "react";
import { useSpeechSynthesis } from "react-speech-kit";

function Output() {
  const [outputMessage, setOutputMessage] = useState("");
  const { speak, cancel } = useSpeechSynthesis();

  // The speak() will get called each time outputMessage is changed 
  useEffect(() => {
      speak({
        text: outputMessage,
      });
  }, [outputMessage]);
.
.
.
export default Output;
}

😃Whoa!
Everything is now setup 🔥
The only thing left is to define our commands 👩🎤

7. Now we're back at our Output.js file to complete our commands.

const commands = [
  {
    // In this, the words that match the splat(*) will be passed
    // into the callback,

    command: "I am *",

    callback: (name) => {
      resetTranscript();
      setOutputMessage(`Hi ${name}. Nice name`);
    },
  },

  // DATE AND TIME
  {
    command: "What time is it",

    callback: () => {
      resetTranscript();
      setOutputMessage(new Date().toLocaleTimeString());
    },
    matchInterim: true,
    // The default value for matchInterim is false, meaning that
    // the only results returned by the recognizer are final and
    // will not change.
  },
  {
    // This example would match both:
    // 'What is the date' and 'What is the date today'

    command: 'What is the date (today)',

    callback: () => {
      resetTranscript();
      setOutputMessage(new Date().toLocaleDateString());
    },
  },

  // GOOGLING (search)
  {
    command: "Search * on google",

    callback: (gitem) => {
      resetTranscript();

      // function to google the query(gitem)
      function toGoogle() {
        window.open(`http://google.com/search?q=${gitem}`, "_blank");
      }
      toGoogle();

      setOutputMessage(`Okay. Googling ${gitem}`);
    },
  },

  // CALCULATIONS
  {
    command: "Add * and *",

    callback: (numa, numb) => {
      resetTranscript();
      const num1 = parseInt(numa, 10);
      const num2 = parseInt(numb, 10);
      setOutputMessage(`The answer is: ${num1 + num2}`);
    },
  },

  // CLEAR or STOP.
  {
    command: "clear",

    callback: () => {
      resetTranscript();
      cancel();
    },
    isFuzzyMatch: true,
    fuzzyMatchingThreshold: 0.2,

    // isFuzzyMatch is false by default.
    // It determines whether the comparison between speech and
    // command is based on similarity rather than an exact match.

    // fuzzyMatchingThreshold (default is 0.8) takes values between
    // 0 (will match anything) and 1 (needs an exact match).
    //  If the similarity of speech to command is higher than this
    // value, the callback will be invoked.
  },
]

😃We have successfully built a voice assistant using the Web Speech API that do as we say 🔥🔥

Note: As of May 2021, browsers support for Web Speech API:

Chrome (desktop)

Chrome (Android)

Safari 14.1

Microsoft Edge

Android webview

Samsung Internet

For all other browsers, you can integrate a polyfill.

Blog

Building a Voice Assistant using Web Speech API

Roopali Singh

Here's a demo that I have made with some styling:

I call it Aether

Join Our Newsletter. No Spam, Only the good stuff.

Related