Expressando (Part 1): A real-time sign language detection system
Pritam
Posted on January 22, 2022
Expressando is a real-time sign language detection system made using OpenCV, Keras, Tensorflow and SciPy. It is a beginner-friendly project, suitable for the enthusiasts in the field of Machine-Learning and Deep-Learning. The projects primarily aim to highlight the basic use of OpenCV in image processing manipulation, training of models with Keras using Tensorflow as backend, and finally, the detection of our customised sign-language after constructing a Convolutional Neural Network on it. So, without any further ado, let us begin with the tutorial:
Expressando has been written in Python. So, before we start, you can have a quick recapitulation of the basics from the following resources, which will be beneficial for a better understanding of the concepts:
Python Beginner's Guide
W3Schools
Tutorials Point
Also, make sure that you have Python installed on your system.
In case, you do not have Python pre-installed, download the latest version of Python from here. You will also have to create a virtual environment for this project.
What is a 'Virtual Environment'?
A virtual environment is an isolated Python-stimulated environment where a project's dependencies and packages are installed in a different directory from the other packages installed in the system's default Python path (known as the base environment) and other virtual environments. It is synonymous to a 'container', where you have all your required dependencies installed and ready to be used in your project. To know how to create a virtual environment, click here.
Download the "requirements.txt" from the given link: requirements.txt.
Copy the "requirements.txt" file and store it under the directory "TDoC-2021". You will have the following folder structure:
├── TDoC-2021
| ├── env
| ├── requirements.txt
Now type the following command in your Terminal window:
pip3 install -r requirements.txt
You will now have all the required dependencies and Python packages with their appropriate versions installed in your virtual environment named "env". You can check whether the dependencies are installed according to the "requirements.txt" file by the following command:
pip3 list
This command enlists all the installed dependencies installed in your encironment.
Configuring Input through Webcam using OpenCV
After settinp up your virtual environment, it is time to configure your digital input. The first step of any image manipulation project starts with the configuration of digital image input using OpenCV. So let us first configure the basic webcam input.
Step 1: Create a file named as "check.py" inside the directory of "TDoC-2021". As the name suggests, we are checking for the input through the webcam using the OpenCV library. Open the file in your code-editor/IDE. The folder structure would look like the following:
├── TDoC-2021
| ├── env
| ├── check.py
| ├── requirements.txt
Step 2: First, import OpenCV into the "check.py" file. It can be accomplished by the following line of code:
import cv2
After importing cv2, we need to create a VideoCapture object, which will initiate the process to retrieve the input through the webcam.
cap = cv2.VideoCapture(0)
Here, "cap" refers to the object that is created using OpenCV to capture the video. It basically returns the video from the first webcam on your computer.
If you are using more than one webcam then the value "0" indicates that the input will be configured through the first webcam of your computer.
For example, if you want to configure the input through your 2nd webcam, then you have to pass "1" instead of "0" as the parameter. In simple words, it means if you want to configure the input through the "n-th" webcam, then you must pass "n-1" as parameter to the VideoCapture method.
Step 3: This step involves rendering a while loop to stimulate asynchronous input through the webcam with the help of a suitable condition. In this step, we will be discussing the most common and important methods that are present in the OpenCV library, which are required for making basic projects and develop sound understanding about the various methods present in OpenCV and their uses. OpenCV is a house to a huge number of methods and functions, so we will be discussing only the important methods, which are necessary for beginners to understand.
Continue in the code-editor as follows:
while (cap.isOpened()):
ret, img = cap.read()
img=cv2.flip(img, 1)
The function cap.isOpened() checks whether the VideoCapture object (here 'cap') is functional or not. This is done by usually checking the response from the webcam under consideration. This code initiates an infinite loop (to be broken later by a break statement), where we have "ret" and "frame" being defined as the cap.read(). Basically, ret is a boolean regarding whether or not there was a return at all. On the other hand, frame contains each frame that is being returned in the form of an image array vector.
This is practised in order to avoid unnecessary IO errors. In case, no frame was returned, ret will obtain False as it's return value. Hence, instead of throwing an IO error, it will pass None to the frame.
The next line of code introduces us to the method flip(). This method inverts the frame taken into consideration, laterally. Using flip, the input will be similar to a mirror. It is beneficial as it eases the orientation of the webcam input.
Step 4:
cv2.rectangle(img, (20, 20), (250, 250), (255, 0, 0), 3)
cv2.imshow("RGB Output", img)
img1 = img[20:250,20:250]
imCopy = img1.copy()
gray = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
In the next lines of code, we are introduced to few other methods in the OpenCV library. The method rectangle() enables us to draw a rectangle of our desired shape on the frame taken into consideration.
It has the following parameters:
- img: It is the frame taken into consideration on which the rectangle is to be drawn.
- (20, 20): It is the starting coordinates of rectangle. The coordinates are represented as tuples of two values, the X and Y coordinates respectively.
- (250, 250): It is the ending coordinates of rectangle. The coordinates are represented as tuples of two values similarly as the starting point. Both the tuples indicate the right diagonal of the rectangle drawn. If the x and y coordinates of the tuples are same, it will result in a square.
- (255, 0, 0): It is the color of border line of rectangle which is to be drawn, passed in the form of BGR index. BGR index comprises of Blue, Green and Red colour values, which are used to define other colours as well. Each of the values ranges from 0 to 255. Here, (255, 0, 0) denotes the blue colour.
- 3: It denotes the thickness of the rectangle border line in px.
The method imshow() shows the image in the form of an independent window. It has two parameters: The name of the window and the image to be displayed.
Next, we extract the region covered by the rectangle in the form of a list of pixels named "img1". The extraction We also make a copy of the extracted image and name the copy as "imCopy" using the copy() function.
Then we are introduced to the method "cvtColor()". This method is used to convert the image into different color-spaces. There are more then hundreds of color-space filters available in OpenCV, but we will be using COLOR_BGR2GRAY for now. This converts the image taken into consideration, in the form of BGR, and converts the entire image into grayscale. We name the grayscale image as gray.
The left image is the original image, while the right image represents it's grayscale form.
We will also use the GaussianBlur() method here. It is an image-smoothening technique (also known as blurring) to reduce the amount of luminant noise in the image. We will stored the reduced image as blur.
The left image is the original image, while the right image represents it's blurred form.
It has the following parameters:
- gray: It is the frame taken into consideration on which the method is to be applied.
- (5, 5): It is the gaussian Kernel size defined along the X and Y axes, passed in the form of a tuple.
- 0: It denotes the thickness of the rectangle border line in px.
Step 5:
ret, thresh1 = cv2.threshold(blur, 10, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
hand_resize = cv2.resize(thresh1, (width, height))
cv2.imshow("Threshold", thresh1)
Thresholding is a technique in OpenCV, which is the assignment of pixel values in relation to the threshold value provided. In thresholding, each pixel value is compared with the threshold value. It is one of the most common (and basic) segmentation techniques in computer vision and it allows us to separate the foreground (i.e., the objects that we are interested in) from the background of the image. A threshold is a value which has two regions on its either side i.e. below the threshold or above the threshold. If the pixel value is smaller than the threshold, it is set to 0, otherwise, it is set to a maximum value.
Here, ret performs the same function as before, while thresh1 contains our thresholded image. Then, we define a width and the height in the form of a tuple, before the initialisation of the cap object.
There are mainly three types of thresholding techniques:
- Simple Threshold: In this type of thresholding, we manually supply parameters to segment the image — this works extremely well in controlled lighting conditions where we can ensure high contrast between the foreground and background of the image. The parameters are discussed later.
- Adaptive Threshold: In this type of thresholding, instead of trying to threshold an image globally using a single value, it breaks the image down into smaller pieces, and then thresholds each of these pieces separately and individually. It is better in limited lighting conditions.
- OTSU Threshold: In Otsu Thresholding, the value of the threshold is not defined but is determined automatically. This works well when we are not sure of the lighting conditions. This is an additive module, i.e, it is applied in addition to Simple or Adaptive threshold and works well with grayscale images.
The function threshold() has the following parameters:
- blur: The input image array on which the blur effect is applied.
- 10: The value of Threshold below and above which pixel values will change accordingly.
- 255: The maximum value that can be assigned to a pixel, in general the intensity of a colour ranges from 0 to 255.
- cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU: The type of thresholding that is applied to the image.
There are other thresholding techniques as well:
- cv2.THRESH_BINARY: If pixel intensity is greater than the set threshold, value is set to 255 (white), else it is set to be at 0 (black). Here the brighter pixels are converted to black and darker pixels to white.
- cv2.THRESH_BINARY_INV: If pixel intensity is greater than the set threshold, value is set to 0(black), else it is set to be 255 (white). Here the brighter pixels are converted to white and darker pixels to black.
- cv2.THRESH_TRUNC: If pixel intensity value is greater than threshold, it is truncated to the mentioned threshold. The pixel values are set to be the same as the threshold. All other values remain the same.
- cv2.THRESH_TOZERO: Pixel intensity is set to 0, for all the pixels intensity less than the threshold value.
- cv2.THRESH_TOZERO_INV: Pixel intensity is set to 0, for all the pixels intensity greater than the threshold value.
The thresholded image of the region under consideration is displayed using the imshow() function.
<
The above are the examples of the thresholding modules.
Step 6:
contours, hierarchy = cv2.findContours(thresh1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(imCopy, contours, -1, (0, 255, 0))
cv2.imshow('Draw Contours', imCopy)
Contours are defined as the line joining all the points along the boundary of an image that are having the same intensity. Contours come handy in shape analysis, finding the size of the object of interest, and object detection. It is defined by the minimum number of edges required to define the shape taken into consideration. This is done well with thresholded and grayscale images. It is done by the function findContours().
Normally we use the cv.findContours() function to detect objects in an image. Sometimes, the objects are in different locations and in some cases, some shapes are inside other shapes just like nested figures or concentric figures. In this case, we call outer one as parent and inner one as child. This way, contours in an image has some relationship to each other. And we can specify how one contour is connected to each other, like, is it child of some other contour, or is it a parent etc. Representation of this relationship is called the Hierarchy.
The above picture represents the hierarchy of the contours. Contours that have the same integer have the same hierarchy.
The function has the following parameters:
- thresh1: The input image array from which the contours are to be detected.
- cv2.RETR_TREE: This is known as the Contour Retrieval Method.
- cv2.CHAIN_APPROX_SIMPLE: This is known as the Contour Approximation Method.
Contour Retrieval Method are of the following types:
- cv2.CV_RETR_EXTERNAL: It retrieves only the extreme outer contours. It sets hierarchy[i][2]=hierarchy[i][3]=-1 for all the contours. This gives "outer" contours, so if you have (say) one contour enclosing another (like concentric circles), only the outermost is given.
- cv2.CV_RETR_LIST: It retrieves all of the contours without establishing any hierarchical relationships. This is applied when the hierarchy and topology of the object cannot be determined from beforehand.
- cv2.CV_RETR_CCOMP: It retrieves all of the contours and organizes them into a two-level hierarchy. At the top level, there are external boundaries of the components. At the second level, there are boundaries of the holes. If there is another contour inside a hole of a connected component, it is still put at the top level. (ADVANCED)
- cv2.CV_RETR_TREE: It retrieves all of the contours and reconstructs a full hierarchy of nested contours. This full hierarchy is built and displayed. It establishes complete hierarchial relations and imagifies the contours.
Contour Approximation Method are of the following types:
- cv2.CHAIN_APPROX_NONE: It stores all the points of the boundary of the shape under consideration. It requires a huge amount of memory to store each unit. It is efficient but highly reduces the speed of execution.
- cv2.CHAIN_APPROX_SIMPLE: It removes all redundant points and compresses the contour, thereby saving memory. It stores the key turning points of the shape under consideration and saves a lot of memory by reducing the number of points, hence increasing the speed of execution.
The examples of the approximation methods are shown as above.
The function drawContours() is used to draw the contours that have been traced, superimposed on the top of an image. In case, we do not want to display it over any image, the default is set to black.
The function has the following parameters:
- imCopy: The input image array on which the contours are to be displayed.
- contours: These refers to the 'contours' array that have been declared and initialised in the findContours() function.
- -1: It is the parameter to show all the contours in the 'contours' array. However, if you want to display a specific contour according to the hierarchy, pass the desired number as the parameter. For example, to get the 3rd contour, you have to pass 2 as a parameter.
- (0, 255, 0): It is the color of contour which is to be drawn, passed in the form of BGR index. Here, (0, 255, 0) denotes the green colour.
Then we display the contours, superimposed on "imCopy" image using the imshow() function.
Step 7: Now, after checking for the input, it is time to proceed for the termination of the while loop and close all the windows and close our Video Capture object.
To exit the program on a specified keyboard interrupt, type the following code:
k = cv2.waitKey(10) & 0xFF
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
The cv2.waitKey(10) function returns -1 when no input is made whatsoever. As soon the event occurs (a Button is pressed, here 27 is the Unicode value for Escape Key), it returns a 32-bit integer. (ADVANCED)
The 0xFF in this scenario is representing binary 11111111, a 8 bit binary, since we only require 8 bits to represent a character we AND waitKey(10) to 0xFF. As a result, an integer is obtained below 255. ord(char) returns the ASCII value of the character which would be again maximum 255. (we often use 'q' as the keybinding to 'quit'). Hence by comparing the integer to the ord(char) value, we can check for a key pressed event and break the loop. The 0xFF is a hexidecimal input, known as bit mask.(ADVANCED)
32 is also the Unicode value for 'Non-breaking Space', made by the Space Bar.
Now the loop breaks when the key is entered, and exits the control out of the loop.
The function cap.release() closes the webcam input and prevents any resource errors. The function cv2.destroyAllWindows() destroys all the opened windows rendered by the imshow() functions and deallocates the memory used by the image vector arrays and frees them.
Now your check.py should look like the following.
import cv2
(width, height) = (130, 100)
cap=cv2.VideoCapture(0)
while (cap.isOpened()):
ret, img = cap.read()
img=cv2.flip(img, 1)
cv2.rectangle(img, (20, 20), (250, 250), (255, 0, 0), 3)
cv2.imshow("RGB Output", img)
img1 = img[20:250,20:250]
imCopy = img1.copy()
gray = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
ret, thresh1 = cv2.threshold(blur, 10, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
hand_resize = cv2.resize(thresh1, (width, height))
cv2.imshow("Threshold", thresh1)
contours, hierarchy = cv2.findContours(thresh1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(imCopy, contours, -1, (0, 255, 0))
cv2.imshow('Draw Contours', imCopy)
k = 0xFF & cv2.waitKey(10)
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
Run the code in your Powershell/terminal using
python check.py
To take input for static images, use the follwing code:
import cv2
img = cv2.imread('abc.jpg')
img2 = cv2.resize(img, (200, 400))
cv2.imshow("Image", img2)
interrupt = cv2.waitKey(0) & 0xFF
if interrupt == 27:
cv2.destroyAllWindows()
Run the code in your Powershell/terminal using
python check.py
ASSIGNMENT 1: Use any 3 functions/modules in OpenCV, and commit the code in the official Expressando TDoC 2021 Repository.
The Github repository where you all shall be pushing your codes on the respective projects assigned is ready.
Link to the repo: Github Repo
You are also asked to follow the tutorial video attached below which will clearly describe how you shall push your code specifically for the TDoC event.
Link to the video: TDoC Instruction Video
Here is also an introductory video on basics of Git & Github so that you are well versed with the git system.
Link to the video: Git and Github
Checking for Convexity Defects in the Camera Input
Since the initial input has been configured through the webcam input, it becomes important to understand the concepts of "defect" as a basic and fundamental method in the domain of detection. In this session, we are going to learn about defects and detect them in our digital video input.
Step 1: Create a file named as 'defects.py' inside the directory of 'TDoC-2021'. As the name suggests, we are checking for the defects in images taken by webcam using the OpenCV library. Open the file your code-editor/IDE. The folder structure would look like the following:
├── TDoC-2021
| ├── env
| ├── check.py
| ├── defects.py
| ├── requirements.txt
Step 2: First import OpenCV, NumPy (as np), and math into the 'defects.py' file. Here, math is present in the standard Python library, and it need not to be installed separately. It can be accomplished by the following line of code:
import cv2
import numpy as np
import math
After importing the packages, we need to create a VideoCapture object, which will initiate the process to retrieve the input through the webcam.
cap = cv2.VideoCapture(0)
Step 3: The next step involves rendering a while loop to stimulate asynchronous input through the webcam with the help of a suitable condition. In this step, we will be discussing the most common and important methods that are present in the OpenCV library, which are required for making basic projects and develop sound understanding about the various methods present in OpenCV and their uses.
Continue in the code-editor as follows:
while (cap.isOpened()):
ret, img = cap.read()
img=cv2.flip(img, 1)
cv2.rectangle(img,(20,20),(250,250),(255,0,0),3)
crop_img = img[20:250, 20:250]
cv2.imshow('Gesture', img)
grey = cv2.cvtColor(crop_img, cv2.COLOR_BGR2GRAY)
value = (35, 35)
blurred = cv2.GaussianBlur(grey, value, 0)
_, thresh1 = cv2.threshold(blurred, 127, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2.imshow('Binary Image', thresh1)
The above lines of the code is just a recap of what we did in Day 3. (REFER TO THE DOCUMENTATION OF DAY-3). Here, we initialise a while loop, which iterates as long as the webcam input returns a frame or cap.isOpened() returns True value. The cap.read() takes the input of the image in the form of an image array vector. The flip() function basically returns the inverted image of the frame taken into consideration. We define a region by means of the rectangle() function, and then extract the region, naming it as crop_img. It is shown by the name "Gesture", using the function imshow().
Then we apply the cvtColor() function and convert the image into it's equivalent grayscale using cv2.COLOR_BGR2GRAY method and name it as grey. Next we declare the tuple value, which contains the kernel standard deviation for x and y coordinates. This tuple is later used in the GaussianBlur() function, where it is used as a parameter. The blurred is image is named as blurred. Then we apply simple threshold using the modules cv2.THRESH_BINARY_INV and cv2.THRESH_OTSU, and naming the resultant image as thresh1. It is shown by the name "Binary Image", using the function imshow().
Step 4:
contours, hierarchy = cv2.findContours(thresh1.copy(),cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
cnt = max(contours, key = lambda x: cv2.contourArea(x))
x,y,w,h = cv2.boundingRect(cnt)
cv2.rectangle(crop_img,(x,y),(x+w,y+h),(0,0,255),2)
Next, we derive the contours from the threshold using cv2.RETR_TREE as the retrieval method and cv2.CHAIN_APPROX_NONE as the approximation method. We then, store the contours in the array named contours, while the hierarchy order is stored in hierarchy. We define a list called "cnt", which stores the external contour with the maximum area enclosed by it. This refers to the area of the object, as it will have the maximum area under consideration. The function contourArea() returns the area enclosed by the contour, and the max() function returns the enclosed contour with the maximum area.
The key used here is: lambda, which is a constant unit vector, used to determine the direction and order of the contours. (ADVANCED)
Go through the following resources to know more about Thresholding and Contours:
cv2.boundingRect() is a function used to create an approximate rectangle of minimum area which encloses the object/contour that is passed into the function as a parameter. This function’s primary use is to highlight the area of interest after obtaining the image’s outer shape, or the external contour. With proper markings, the users can easily highlight the desired aspect in an image. It ensures clear focus and better understanding of the operations.
cv2.boundingrect() returns 4 numeric values when the contour is passed as an argument. These 4 values correspond to x, y, w, h respectively. These values are more described as –
- x - X coordinate of the contour, closest to the origin (Top left of the window)
- y - Y coordinate of the contour, closest to the origin (Top left of the window)
- w - Width of the rectangle which will enclose the contour.
- h - Height of the rectangle which will enclose the contour.
The above shows the use of boundingrect() function to enclose all the 3 shapes in the figure.
Next we draw a rectangle using the rectangle() function, with the coordinates from (x,y) to (x+w,y+h) as the diagonal over the crop_img. This serves as an enclosure to the contours.
Step 5:
hull = cv2.convexHull(cnt)
drawing = np.zeros(crop_img.shape,np.uint8)
cv2.drawContours(drawing,[cnt],0,(0,255,0),0)
cv2.drawContours(drawing,[hull],0,(0,0,255),0)
cv2.imshow('Contours', drawing)
What is a 'Convex Hull'?
A Convex object is one with no interior angles greater than 180 degrees. A shape that is not convex is called Non-Convex or Concave. Hull means the exterior or the shape of the object. Therefore, the Convex Hull of a shape or a group of points is a tight fitting convex boundary around the points or the shape. Any deviation of the object from this hull can be considered as convexity defect.
This is an example of a convex hull.
How to display the Convex Hull ?
OpenCV provides a function convexHull() which stores all the points of the hull in the form of list/array of points, on passing cnt as the contour array.
The next line of the program makes use of a NumPy array to store the crop_img, and using the function np.zeroes(), it converts the entire image to black.
Here, we have used black background to clearly visualise the contours. np.uint8 is an 8-bit unsigned integer basically, used to define the source of image.
Then we use the drawContours() function to draw the contour and the hull using green and red colours respectively, over the image "drawing", which is the black coloured background of the same size as "crop_img". Then we show the output under the name "Contours", using the function imshow().
Step 6: Next, we have to detect the defects by making use of the Convex Hull.
What are 'Convexity Defects'?
Any deviation of the contour from its convex hull is known as the convexity defect.
OpenCV provides a function cv2.convexityDefects() for finding the convexity defects of a contour. This takes as input, the contour and its corresponding hull indices and returns an array containing the convexity defects as output.
This figure shows the depiction of the hull, contours and the defect.
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
We redeclare hull with an extra parameter returnpoints=False. This will give us the indices of the contour points that make the hull. The function convexityDefects() is used to find the defects directly by passing the contours array (cnt) and the hull.
Convexity Defects returns an array where each row contains these values :
- start point as 's'
- end point as 'e'
- farthest point as 'f'
- approximate distance to farthest point as 'd'
Step 7: Now we use some mathematical expressions to determine the number of convexity defects in the hull, and count them accordingly.
for i in range(defects.shape[0]):
s,e,f,d = defects[i,0]
start = tuple(cnt[s][0])
end = tuple(cnt[e][0])
far = tuple(cnt[f][0])
a = math.sqrt((end[0] - start[0])**2 + (end[1] - start[1])**2)
b = math.sqrt((far[0] - start[0])**2 + (far[1] - start[1])**2)
c = math.sqrt((end[0] - far[0])**2 + (end[1] - far[1])**2)
angle = math.acos((b**2 + c**2 - a**2)/(2*b*c)) * 57
if angle <= 90:
count_defects += 1
cv2.circle(crop_img,far,1,[0,0,255],-1)
cv2.line(crop_img,start,end,[0,255,0],2)
Here, defects returns an array where each row contains these values [start point, end point, farthest point, approximate distance to farthest point] , i.e. (s,e,f,d). We segment and store each of them as a separate independent 2D array, with y-coordinate as 0. These arrays are later converted into a tuple of coordinates, and they are named as start, end and far. Here, start and end points lie on the contour, whereas far points lie on the hull. We, then use the basic distance formula to calculate the lengthd of a, b and c.
Now, this is Math time! Let’s understand the cosine theorem.
In trigonometry, the law of cosines relates the lengths of the sides of a triangle to the cosine of one of its angles. Using notation as in the given figure, the law of cosines states where gamma denotes the angle contained between sides of lengths a and b and opposite the side of length c.
The formula for the same is given below:
By seeing this formula now we understand that if we have the parameters a, b and c, then we can find gamma, the angle between the sides a and b.
For finding gamma, the following formula is used:
The pictorial depiction of the following would look like the following:
Now, gamma is always less than or equal to 90 degrees (maximum), So we can say: If gamma is less than 90 degree or pi/2, we consider it as a finger.
By this point, we can easily derive the three sides: a, b, c (see CODE) and from the cosine theorem we can derive gamma or angle between two fingers. As you read earlier, if gamma is less than 90 degree we treated it as a finger.
We convert gamma into degrees by multiplying with 57, as acos() function returns the angle in radians.
We then check if the angle is less than or equal to 90 degrees, and if it is true, we increase the value of count_defects by 1. The existence of an angle less than 90 denotes the presence of defects.
After knowing gamma we just draw circle with radius 1 in approximate distance to farthest points. The far points are denoted by the line drawn by cv2.line(). The circle drawn would not be uniform as the farthest points are not present in a straight line. Next, we display the number of defects using the function cv2.putText().
The parameters of cv2.circle() are:
- img: It is the image on which circle is to be drawn.
- far: It is the center coordinates of circle. The coordinates are represented as tuples of two values i.e. (X coordinate value, Y coordinate value).
- 1: It is the radius of circle.
- [0,0,255]: It is the color of border line of circle to be drawn in BGR index.
- -1: It is the thickness of the circle border line in px. Thickness of -1 px will fill the circle shape by the specified color.
The parameters of cv2.line() are:
- crop_img: It is the image on which line is to be drawn.
- start: It is the starting coordinates of line. The coordinates are represented as tuples of two values i.e. (X coordinate value, Y coordinate value).
- end: It is the ending coordinates of line. The coordinates are represented as tuples of two values i.e. (X coordinate value, Y coordinate value).
- [0,255,0]: It is the color of border line of circle to be drawn in BGR index.
- 2: It is the thickness of the circle border line in px.
The parameters of cv2.putText() are:
- img: It is the image on which text is to be drawn.
- "Number : 2": It is the text string to be drawn on the image.
- (50,450): It is the coordinates of the bottom-left corner of the text string in the image. The coordinates are represented as tuples of two values i.e. (X coordinate value, Y coordinate value).
- cv2.FONT_HERSHEY_SIMPLEX: It denotes the font type, used in OpenCV.
- 1: It is the fontScale factor that is multiplied by the font-specific base size.
- (255,255,255): It is the color of text string to be drawn in BGR. Here, the colour is white.
- 1: It is the thickness of the line in px.
The fonts available in OpenCV are:
- FONT_HERSHEY_SIMPLEX
- FONT_HERSHEY_PLAIN
- FONT_HERSHEY_DUPLEX
- FONT_HERSHEY_COMPLEX
- FONT_HERSHEY_TRIPLEX
- FONT_HERSHEY_COMPLEX_SMALL
- FONT_HERSHEY_SCRIPT_SIMPLEX
- FONT_HERSHEY_SCRIPT_COMPLEX
If there are n defects, then there exists n-1 fingers under detection.
if count_defects == 1:
cv2.putText(img,"Number : 2", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 2:
cv2.putText(img,"Number : 3", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 3:
cv2.putText(img,"Number : 4", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 4:
cv2.putText(img,"Number : 5", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 5:
cv2.putText(img,"Number : 6", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
else:
cv2.putText(img,"Number : 1", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
cv2.imshow('Defects', crop_img)
The number of defects will be displayed as follows, which will be rendered by the name of "Defects", using imshow():
Step 8: Now, after checking for the defects, it is time to proceed for the termination of the while loop and close all the windows and close our Video Capture object.
To exit the program on a specified keyboard interrupt, type the following code:
k = cv2.waitKey(10) & 0xFF
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
Now the loop breaks when the key is entered, and exits the control out of the loop.
The function cap.release() closes the webcam input and prevents any resource errors. The function cv2.destroyAllWindows() destroys all the opened windows rendered by the imshow() functions and deallocates the memory used by the image vector arrays and frees them.
To know more about Convexity Defects, go here: Convexity Defects
Now your defects.py should look like the following.
import cv2
import numpy as np
import math
cap=cv2.VideoCapture(0)
while(cap.isOpened()):
ret, img = cap.read()
img=cv2.flip(img, 1)
cv2.rectangle(img,(20,20),(250,250),(255,0,0),3)
crop_img = img[20:250, 20:250]
cv2.imshow('Gesture', img)
grey = cv2.cvtColor(crop_img, cv2.COLOR_BGR2GRAY)
value = (35, 35)
blurred = cv2.GaussianBlur(grey, value, 0)
_, thresh1 = cv2.threshold(blurred, 127, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2.imshow('Binary Image', thresh1)
contours, hierarchy = cv2.findContours(thresh1.copy(),cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
cnt = max(contours, key = lambda x: cv2.contourArea(x))
x,y,w,h = cv2.boundingRect(cnt)
cv2.rectangle(crop_img,(x,y),(x+w,y+h),(0,0,255),0)
hull = cv2.convexHull(cnt)
drawing = np.zeros(crop_img.shape,np.uint8)
cv2.drawContours(drawing,[cnt],0,(0,255,0),0)
cv2.drawContours(drawing,[hull],0,(0,0,255),0)
cv2.imshow('Contours', drawing)
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
count_defects = 0
cv2.drawContours(thresh1, contours, -1, (0,255,0), 3)
for i in range(defects.shape[0]):
s,e,f,d = defects[i,0]
start = tuple(cnt[s][0])
end = tuple(cnt[e][0])
far = tuple(cnt[f][0])
a = math.sqrt((end[0] - start[0])**2 + (end[1] - start[1])**2)
b = math.sqrt((far[0] - start[0])**2 + (far[1] - start[1])**2)
c = math.sqrt((end[0] - far[0])**2 + (end[1] - far[1])**2)
angle = math.acos((b**2 + c**2 - a**2)/(2*b*c)) * 57
if angle <= 90:
count_defects += 1
cv2.circle(crop_img,far,1,[0,0,255],-1)
cv2.line(crop_img,start,end,[0,255,0],2)
if count_defects == 1:
cv2.putText(img,"Number : 2", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 2:
cv2.putText(img, "Number : 3", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 3:
cv2.putText(img,"Number : 4", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 4:
cv2.putText(img,"Number : 5", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
elif count_defects == 5:
cv2.putText(img,"Number : 6", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
else:
cv2.putText(img,"Number : 1", (50,450), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 1)
cv2.imshow('Defects', crop_img)
k = cv2.waitKey(10)
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
Run the code in your Powershell/terminal using
python check.py
In the part 2 of this post, you will be learning about Collecting data through OpenCV, Demonstration of Data-Collection and TensorFlow, Convolutional Neural Networks (CNN) any many more things. You will also learn about how live prediction works and all the things. Stay tuned for part 2!
Project collaborators:
Posted on January 22, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.