23
Building our own J.A.R.V.I.S. using Python - Part I
Do you remember J.A.R.V.I.S., Tony Stark's virtual personal assistant? I'm sure you do!
Have you ever wondered about creating your own personal assistant? Yes? Tony Stark can help us with that! Oops, did you forget he is no more? It's sad that he cannot save us anymore.
During the development of the project, we'll come across various modules and external libraries. Let's learn and install them. But before we install them, let's create a virtual environment and activate it.
We are going to create a virtual environment using virtualenv
. Python now ships with a pre-installed virtualenv
library. So, to create a virtual environment, you can use the below command:
$ python -m venv env
The above command will create a virtual environment named env
. Now, we need to activate the environment using the command:
$ . env/Scripts/activate
To verify if the environment has been activated or not, you can see (env)
in your terminal. Now, we can install the libraries.
- pyttsx3: pyttsx is a cross-platform text to speech library which is platform-independent. The major advantage of using this library for text-to-speech conversion is that it works offline. To install this module type the below command in the terminal.
$ pip install pyttsx3
- SpeechRecognition : ** ** It allows us to convert audio into text for further processing. To install this module type the below command in the terminal.
$ pip install SpeechRecognition
- pywhatkit: It is an easy-to-use library that will help us interact with the browser very easily. To install the module, run the following command in the terminal.
$ pip install pywhatkit
- wikipedia: It is used to fetch a variety of information from the Wikipedia website. To install this module type the below command in the terminal.
$ pip install wikipedia
- requests: It is an elegant and simple HTTP library for Python that allows you to send HTTP/1.1 requests extremely easily. To install the module, run the following command in the terminal:
$ pip install requests
.env File
We need this file to store some private data such as API Keys, Passwords, etc related to the project. For now, let's store the name of the user and the bot.
Create a file named .env
and add the following content there:
USER=Ashutosh
BOTNAME=JARVIS
To use the contents from .env
file, we'll install another module called python-decouple as:
$ pip install python-decouple
Learn more about Environment Variables in Python here.
Setting up JARVIS
Before we start defining a few important functions, let's create a speech engine first.
import pyttsx3
from decouple import config
USERNAME = config('USER')
BOTNAME = config('BOTNAME')
engine = pyttsx3.init('sapi5')
# Set Rate
engine.setProperty('rate', 190)
# Set Volume
engine.setProperty('volume', 1.0)
# Set Voice (Female)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
Let's analyze the above script. First of all, we have initialized an engine
using the pyttsx3 module. sapi5
is a Microsoft Speech API that helps us use the voices. Learn more about it here. Next, we are setting the rate
and volume
properties of the speech engine using setProperty
method. Now, we can get the voices from the engine using the getProperty
method. voices
will be a list of voices available in our system. If we print it, we can see as below:
[<pyttsx3.voice.Voice object at 0x000001AB9FB834F0>, <pyttsx3.voice.Voice object at 0x000001AB9FB83490>]
The first one is a male voice and the other one is a female voice. JARVIS was a male assistant in the movies, but I've chosen to set the voice
property to the female for this tutorial using the setProperty
method.
Note: If you get an error related to PyAudio, download PyAudio wheel from here and install it within the virtual environment.
Also, using the config
method from decouple, we are getting the value of USER
and BOTNAME
from the environment variables.
1. Speak Function
Speak function will be responsible to speak whatever text is passed to it. Let's see the code:
# Text to Speech Conversion
def speak(text):
"""Used to speak whatever text is passed to it"""
engine.say(text)
engine.runAndWait()
In the speak()
method, the engine speaks whatever text is passed to it using the say()
method. Using the runAndWait()
method, it blocks during the event loop and returns when the commands queue is cleared.
This function will be used to greet the user whenever the program is run. According to the current time, it greets Good Morning, Good Afternoon, or Good Evening to the user.
from datetime import datetime
# Greet the user
def greet_user():
"""Greets the user according to the time"""
hour = datetime.now().hour
if (hour >= 6) and (hour < 12):
speak(f"Good Morning {USERNAME}")
elif (hour >= 12) and (hour < 16):
speak(f"Good afternoon {USERNAME}")
elif (hour >= 16) and (hour < 19):
speak(f"Good Evening {USERNAME}")
speak(f"I am {BOTNAME}. How may I assist you?")
First, we get the current hour, i.e., if the current time is 11:15 AM, the hour will be 11. If the value of hour is between 6 and 12, wish Good Morning to the user. If the value is between 12 and 16, wish Good Afternoon and similarly, if the value is between 16 and 19, wish Good Evening. We are using the speak method to wish the user.
This function is for taking the commands from the user and recognizing the command using the speech_recognition
module.
import speech_recognition as sr
from random import choice
from utils import opening_text
# Takes Input from User
def take_user_input():
"""Takes user input, recognizes it using Speech Recognition module and converts it into text"""
r = sr.Recognizer()
with sr.Microphone() as source:
print('Listening....')
r.pause_threshold = 1
audio = r.listen(source)
try:
print('Recognizing...')
query = r.recognize_google(audio, language='en-in')
if not 'exit' in query or 'stop' in query:
speak(choice(opening_text))
else:
hour = datetime.now().hour
if hour >= 21 and hour < 6:
speak("Good night sir, take care!")
else:
speak('Have a good day sir!')
exit()
except Exception:
speak('Sorry, I could not understand. Could you please say that again?')
query = 'None'
return query
We have imported speech_recognition
module as sr
. The Recognizer class within the speech_recognition
module helps us recognize the audio. The same module has a Microphone class that gives us access to the microphone of the device. So with the microphone as the source
, we try to listen to the audio using the listen()
method in the Recognizer class. We have also set the pause_threshold
to 1, i.e., it will not complain even if we pause for one second during we speak.
Next, using the recognize_google()
method from the Recognizer class, we try to recognize the audio. The recognize_google()
method performs speech recognition on the audio passed to it, using the Google Speech Recognition API. We have set the language to en-in
, i.e. English India. It returns the transcript of the audio which is nothing but a string. We've stored it in a variable called query
.
If the query has exit or stop words in it, it means we're asking the assistant to stop immediately. So, before stopping, we greet the user again as per the current hour. If the hour is between 21 and 6, wish Good Night to the user, else, some other message. We create a utils.py
file which has just one list containing a few statements as:
opening_text = [
"Cool, I'm on it sir.",
"Okay sir, I'm working on it.",
"Just a second sir.",
]
If the query doesn't have those two words(exit or stop), we speak something to tell the user that we have heard you. For that, we will use the choice method from the random module to randomly select any statement from the opening_text
list. After speaking, we exit from the program.
During this entire process, if we encounter an exception, we apologize to the user and set the query
to None. In the end, we return the query
.
To run the project, we're using the main method.
if __name__ == ' __main__':
greet_user()
while True:
query = take_user_input().lower()
print(query)
As we know, the first thing we need to do is to greet the user using the greet_user()
function. Next, we run a while loop to continuously take input from the user using the take_user_input()
function. For now, we're just printing the query
.
For now, the complete code in main.py
looks like this:
import pyttsx3
import speech_recognition as sr
from decouple import config
from datetime import datetime
from random import choice
from utils import opening_text
USERNAME = config('USER')
BOTNAME = config('BOTNAME')
engine = pyttsx3.init('sapi5')
# Set Rate
engine.setProperty('rate', 190)
# Set Volume
engine.setProperty('volume', 1.0)
# Set Voice (Female)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
# Text to Speech Conversion
def speak(text):
"""Used to speak whatever text is passed to it"""
engine.say(text)
engine.runAndWait()
# Greet the user
def greet_user():
"""Greets the user according to the time"""
hour = datetime.now().hour
if (hour >= 6) and (hour < 12):
speak(f"Good Morning {USERNAME}")
elif (hour >= 12) and (hour < 16):
speak(f"Good afternoon {USERNAME}")
elif (hour >= 16) and (hour < 19):
speak(f"Good Evening {USERNAME}")
speak(f"I am {BOTNAME}. How may I assist you?")
# Takes Input from User
def take_user_input():
"""Takes user input, recognizes it using Speech Recognition module and converts it into text"""
r = sr.Recognizer()
with sr.Microphone() as source:
print('Listening....')
r.pause_threshold = 1
audio = r.listen(source)
try:
print('Recognizing...')
query = r.recognize_google(audio, language='en-in')
if not 'exit' in query or 'stop' in query:
speak(choice(opening_text))
else:
hour = datetime.now().hour
if hour >= 21 and hour < 6:
speak("Good night sir, take care!")
else:
speak('Have a good day sir!')
exit()
except Exception:
speak('Sorry, I could not understand. Could you please say that again?')
query = 'None'
return query
if __name__ == ' __main__':
greet_user()
while True:
query = take_user_input().lower()
print(query)
You can run and test the application now.
$ python main.py
Conclusion
In this part, we have completed the setup of our virtual personal assistant. We have not added any functionality to it yet. We'll work on those functionalities in the next part of the blog. Stay Tuned!
23