Making a simple voice controlled personal assistant interface using python

mshrish

shrish

Posted on December 4, 2020

Making a simple voice controlled personal assistant interface using python

I'm using tkinter,a python module, to create the Graphical User Interface(GUI) in which our application is gonna be wrapped in.

This project is a mere try to produce a simple version of Windows Cortana using python.Though my application may not be as robust as Cortana it could perform some functions that Cortana would.

The final product would be like this:
Alt Text

As you can see the interface contains a button("listen") which turns on the mike of your system to recognize your speech and process your query.

Actions that could be performed:

1)Greets user when he/she greets
2)Informs time,date and month to the user when he/she demands it.
3)Opens web browsers and search pharses specified by user.
4)Opens Youtube and searches the video specified by the user.
5)Opens Wikipedia and searches the term specified by the user.

Modules used:-

1)Tkinter to create GUI
2)Pyttsx3 to convert text to synthetic voice
3)Speech Recognition and Pyaudio to recognize speech input from user
4)Time module
5)"Random" module
6)selenium to open webdriver and automate search engine
Make sure the modules are installed in your machine.
Assuming that you know the fundamentals of these modules I write this article.

Code Architecture:

In order to keep things simple I created a giant class and instance of the class.Within the class I defined every method required.The class contains 5 parts.

The first part is an init function which defines our GUI
and wraps the button inside it.
The second part contains a method which converts text to speech.
The third part contains a method which recognizes speech from user and returns text.
The fourth part is not just one function but a bunch of functions each with it's own functionality(such as a function to greet the user,search web browser,fetch time and date).
The fifth and last part of the class is a method defined to process the input given by the user and to give desired output.

Alright let's get started with the code!!

Importing modules:-

#Modules used
from tkinter import*
import pyttsx3
import speech_recognition as sr
import pyaudio
import random
import time
from tkinter import messagebox
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
Enter fullscreen mode Exit fullscreen mode

Defining the class:-

Having imported the modules we shall start creating the class.

#Main class

class Window(Frame):
    #defining our main window
    def __init__(self,master):
        self.master=master
        master.title("DREAM")
        A=Label(master,text="try saying *what can you do*")
        A.pack(side="top")
        Button(master,text="listen",width=100,relief="groove",command=self.Processo_r).pack(side="bottom")

root=Tk()
#instance of the class  
app=Window(root)
root.geometry("300x50")
#Runs the application until we close
root.mainloop()
Enter fullscreen mode Exit fullscreen mode

This class defines a GUI window with a text saying "try saying what can you do" and a button.Note that the button has an attribute named "command" which is linked with a class method.It means that when the button is pressed the "self.processo_r" method(which will be defined further) gets executed.

The upcoming methods are defined inside the class

Defining method to convert text to speech:

 def speak(self,output):
        #initiating the speech engine
        engine = pyttsx3.init()
        #speaking the desired output 
        engine.say(output)
        engine.runAndWait()
Enter fullscreen mode Exit fullscreen mode

In order to convert text to speech i'm using Pyttsx3 module.The method has one parameter output which will be spoken by the synthetic voice.

Method to recognize speech and convert it to text:

    def speech_recog(self):
        #recognizer class
        r=sr.Recognizer()
        #Specifing the microphone to be activated
        mic = sr.Microphone(device_index=1)

        #listening to the user 
        with mic as s:
            audio = r.listen(s, timeout=5)
            r.adjust_for_ambient_noise(s)

        #Converting the audio to text  
        try:
            """I use google engine to convert the speech 
             text but you may use other engines such as 
             sphinx,IBM speech to text etc."""
            speech = r.recognize_google(audio)
            return speech

        """When engine couldn't recognize the speech 
        throws this"""
        except sr.UnknownValueError:
            #calling the text to speech function
            self.speak("please try again,couldnt identify")

        """This error shows up when the microphone cant 
        pick up any speech"""
        except sr.WaitTimeoutError as e:
            self.speak("please try again") 
Enter fullscreen mode Exit fullscreen mode

I use speech Recognition module to recognize speech and to convert it to text.This function when called turns on the microphone and recognizes the speech.Then it converts it to text and returns it.

Now that i have defined three parts of the class i might as well start defining methods with specific functions.

Method to greet the user:

    def greet(self):
       #greets the user with a random phrase from A
        A=["Hi,nice to meet you","hello","Nice to meet you","hey,nice to meet you","good to meet you!"]
        b=random.choice(A)
        self.speak(b)
Enter fullscreen mode Exit fullscreen mode

Method to tell time

    def tell_time(self):
        localtime = time.asctime(time.localtime(time.time()))
        a = localtime[11:16]
        self.speak(a)


Enter fullscreen mode Exit fullscreen mode

This method uses time module to get local time of the user's device and informs the user when asked.

Method to tell day of the week:

    def tell_day(self):
        localtime = time.asctime(time.localtime(time.time()))
        day = localtime[0:3]
        if day == "Sun":
            self.speak("it's sunday")
        if day == "Mon":
            self.speak("it's monday")
        if day == "Tue":
            self.speak("it's tuesday")
        if day == "Wed":
            self.speak("it's wednesday")
        if day == "Thu":
            self.speak("it's thursday")
        if day == "Fri":
            self.speak("it's friday")
        if day == "Sat":
            self.speak("it's saturday")
Enter fullscreen mode Exit fullscreen mode

This method uses time module to get day of the week and informs the user when asked.

Method to tell month of the year:


 def tell_month(self):
        localtime = time.asctime(time.localtime(time.time()))
        m_onth = localtime[4:7]
        if m_onth == "Jan":
            self.speak("it's january")
        if m_onth == "Feb":
            self.speak("it's february")
        if m_onth == "Mar":
            self.speak("it's march")
        if m_onth == "Apr":
            self.speak("it's april")
        if m_onth == "May":
            self.speak("it's may")
        if m_onth == "Jun":
            self.speak("it's june")
        if m_onth == "Jul":
            self.speak("it's july")
        if m_onth == "Aug":
            self.speak("it's august")
        if m_onth == "Sep":
            self.speak("it's september")
        if m_onth == "Oct":
            self.speak("it's october")
        if m_onth == "Nov":
            self.speak("it's november")
        if m_onth == "Dec":
            self.speak("it's december")
Enter fullscreen mode Exit fullscreen mode

This method uses time module to get month of the year and informs the user when asked.

Method to search phrases in web browser:

    def search(self,web_name):
        self.speak("Searching")
        """Make sure that you have installed the specific driver 
        for your webbrowser.The executable_path could be different for you"""

        #Opeing the driver
        driver = webdriver.Chrome(executable_path="C:\Program Files (x86)\chromedriver.exe")

        #Navigating to google
        driver.get('https://www.google.com/')

        #Locating the search engine
        search_engine = driver.find_element_by_name("q")

        #Search the phrase(web_name) and hitting enter to show results
        search_engine.send_keys(web_name + Keys.ENTER)
Enter fullscreen mode Exit fullscreen mode

I have used Selenium for searching the phrase specified by the user.As i have said before make sure you know the fundamentals of the specified modules.Using similar algorithm with some minor changes let's create a method to open google chrome,search Youtube videos and wikipedia articles.

Method to Open Chrome:

    def open_chrome(self):
        self.speak("opening chrome")
        driver=webdriver.Chrome(executable_path="C:\Program Files (x86)\chromedriver.exe")
        driver.get("https://www.google.com/")
Enter fullscreen mode Exit fullscreen mode

Method to search Youtube videos

     def play_tube(self, vid_name):
        self.speak("Searching youtube")

        #intializing driver
        driver = webdriver.Chrome(executable_path="C:\Program Files (x86)\chromedriver.exe")

        #navigating to Youtube
        driver.get('https://www.youtube.com/')

        #Locating the Youtube search engine
        search_engine = driver.find_element_by_name("search_query")

        # searching the specified video
        search_engine.send_keys(vid_name + Keys.ENTER)
Enter fullscreen mode Exit fullscreen mode

I have used the same algorithm which I used to search google to search youtube.The main difference is the driver navigates to google in the former and youtube in the latter.

Method for searching article in wikipedia

    def search_wiki(self, article):

        #intializing driver
       driver=webdriver.Chrome(executable_path="C:\Program Files (x86)\chromedriver.exe")

        #Navigating to Wikipedia
       driver.get("https://www.wikipedia.org/")

        #Locating the wikipedia search engine
       search_engine=driver.find_element_by_name("search")

       #Searching the specified phrase
       search_engine.send_keys(article+Keys.ENTER)
Enter fullscreen mode Exit fullscreen mode

Method which displays a list containing all possible operations the application supports

 def functions(self):
        self.speak("here is a list of what i can do")
        messagebox.showinfo("DREAM functions", "1.Try saying 'Hi','Hello'" +
                            "\n2.Try asking 'What day is this?'" +
                            "\n3.Try asking 'What month is it?'" +
                            "\n4.Try asking 'What time is it?'" +
                            "\n5.You search in google by saying...'Search (or) Google <anything>'" +
                            "\n6.Play youtube by saying'YouTube... <video_name>'" +
                            "\n7.Search in Wikipedia by saying...'wikipedia...<anything>'" +
                            "\n8.To close say 'Bye' or 'Sleep' or 'See you later'")

Enter fullscreen mode Exit fullscreen mode

Method to quit the application:

    def shut(self):
        #bids the user goodbye and quits
        A=random.choice(["bye", "good bye", "take care bye"])
        self.speak(end_greet)
        exit()
Enter fullscreen mode Exit fullscreen mode

Defining a method to process the input:

    def Processo_r(self):
        speech=str(self.speech_recog())

        if speech=="What can you do":
            self.functions()   


        A=["hi","hello","hey","hai","hey dream""hi dream","hello dream"]
        if speech in A:
            self.greet()

        if speech =="who are you":
            self.speak("i'm dream")
            self.speak("your personal assistant")

        B=["what day is it","what day is today","what day is this"]
        if speech in B:
            self.tell_day()

        C=["what month is it","what month is this"]
        if speech in C:
            self.tell_month()

        D=["what time is it","what is the time","time please",]
        if speech in D:
            self.tell_time()

        if speech[0:6] =="Google":
            self.search(speech[7:])

        if speech[0:7]=="YouTube":
            self.play_tube(speech[8:])

        if speech=="open Chrome":
            self.open_chrome()

        if speech[0:9]=="Wikipedia":
            self.search_wiki(speech[10:])

        E=["bye","bye dream","shutdown","quit"]
        if speech in C:
            self.shut()
        else:
            self.speak("I am sorry couldn't perform the task you specified")

Enter fullscreen mode Exit fullscreen mode

This method gets executed when we press the Listen button in the interface.It calls the speech_recog function that we defined before and stores the returned text.Then it analyses the text with a series of "if" conditions and gives the user desired output.

After putting together the code the application should be working perfectly.Make sure you are connected to internet.You can also add some new methods to the class which performs something that pleases you!

Thank you for reading:-).
If you have any queries let me know by posting it in discussion.

💖 💪 🙅 🚩
mshrish
shrish

Posted on December 4, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related