Detect body gestures to control RPG game using Mediapipe
Viet Hoang
Posted on February 3, 2023
If you have ever played Nintendo Switch games, you are likely familiar with the game Ring Fit Adventure, an exercise-based video game that encourages physical activity. Or Mario Tennis and its swing mode, which allows players to swing the Joy-Con like a tennis racket to play the game. Physical activity and fun gameplay make these games an excellent choice for anyone seeking a unique and entertaining way to stay fit.
So this post will show you how to create an app that converts your body gestures into real-time game controls. It can't yet replace all the functions of a keyboard or controller, but we'll strive to improve it in future versions.
You can view the demo video here. I am utilizing this application to play The Legend of Zelda: Breath of the Wild.
This also has a function to control the steering wheel, allowing you to use it to control a racing game. You can view the demo video here.
Here is the full code of this project.
https://github.com/ngviethoang/body-gesture-to-keyboard-control
This post below will explain how this project works.
Outline
First, let's break this down into smaller problems. It will be easier to comprehend and address this issue.
- Detect human pose from the camera: We will apply the Mediapipe pose solution here to get the pose in real-time
- Detect pre-defined body gestures: Define body gestures from pose estimation output to detect when the conditions for the gesture are met.
- Trigger keyboard events corresponding to each gesture.
We will attempt to solve each problem and then incorporate them into the app.
Pose detection using Mediapipe
In this step, I will use PyQt to build the app and add mediapipe solutions to it.
First, install the packages that we need to use in this project, like PySide6, mediapipe, opencv, numpy.
pip install PySide6 opencv-python mediapipe numpy
Next, create a file to run the app.
import sys
from PySide6.QtCore import Qt, Slot
from PySide6.QtGui import QImage, QPixmap
from PySide6.QtWidgets import (
QApplication,
QComboBox,
QHBoxLayout,
QLabel,
QMainWindow,
QCheckBox,
QVBoxLayout,
QWidget,
QFormLayout,
QSlider,
QPushButton,
)
class Window(QMainWindow):
def __init__(self):
super().__init__()
# Title and dimensions
self.setWindowTitle("Pose Detection")
self.setGeometry(100, 100, 900, 650)
if __name__ == "__main__":
app = QApplication()
w = Window()
w.show()
sys.exit(app.exec())
To run the camera and extract images from it, we need to run an independent thread. This thread will read the image output from the camera. We can then use Mediapipe to process and display this image.
Create a class that inherits from PySide6's QThread. This class should be used to run the pose detection process.
class Cv2Thread(QThread):
def __init__(self, parent=None):
QThread.__init__(self, parent)
def run(self):
pass
In this run()
function, we will read an image from the camera and run MediaPipe's pose estimation.
You can see the mediapipe solution for Python here.
Here is the code to run the pose estimation. It also draws the landmarks for the detected pose.
The results
variable is the output of the pose detection. It contains the body's landmarks, which we can use to detect gestures.
You can read the comment for more information.
def run(self):
self.cap = cv2.VideoCapture(0)
with mp_pose.Pose() as pose:
while self.cap.isOpened():
success, image = self.cap.read()
if not success:
print("Ignoring empty camera frame.")
# If loading a video, use 'break' instead of 'continue'.
continue
# To improve performance, optionally mark the image as not writeable to
# pass by reference.
# Recolor image to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image.flags.writeable = False
# Make detection
results = pose.process(image)
# Recolor back to BGR
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Draw landmark annotation on the image.
mp_drawing.draw_landmarks(
image,
results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style(),
)
# Reading the image in RGB to display it
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Creating and scaling QImage
h, w, ch = image.shape
image = QImage(image.data, w, h, ch * w, QImage.Format_RGB888)
image = image.scaled(640, 480, Qt.KeepAspectRatio)
if cv2.waitKey(5) & 0xFF == 27:
break
sys.exit(-1)
The processor can now be run on the thread, but we need the app to display the processed image. In PyQt, we have the Signal to send data to the main window.
class Cv2Thread(QThread):
update_frame = Signal(QImage)
In the run
function, add the emit
function.
def run(self):
self.cap = cv2.VideoCapture(0)
with mp_pose.Pose() as pose:
while self.cap.isOpened():
success, image = self.cap.read()
...
# Creating and scaling QImage
h, w, ch = image.shape
image = QImage(image.data, w, h, ch * w, QImage.Format_RGB888)
image = image.scaled(640, 480, Qt.KeepAspectRatio)
# Emit signal
self.update_frame.emit(image)
if cv2.waitKey(5) & 0xFF == 27:
break
sys.exit(-1)
Now back to Window
class, update the __init__
function to create the thread and receive the image data.
After receiving the image, we set it to the label and show it in the app.
class Window(QMainWindow):
def __init__(self):
super().__init__()
# Title and dimensions
self.setWindowTitle("Pose Detection")
self.setGeometry(100, 100, 900, 650)
self.cv2_thread = Cv2Thread(self)
self.cv2_thread.finished.connect(self.close)
# Receive data from thread
self.cv2_thread.update_frame.connect(self.setImage)
@Slot(QImage)
def setImage(self, image):
self.camera_label.setPixmap(QPixmap.fromImage(image))
Now let’s run the app and see the result. Don’t forget to activate the virtualenv
if it’s used.
python window.py
Now you can see pose landmarks marked in the camera.
Detect body gestures
Using the coordinates of these landmarks, we will attempt to detect body gestures and emit events when they are identified.
The results contain 33 pose landmarks, as shown in the image below. We can use these landmarks to detect gestures.
Another solution is to use machine learning (ML) to train the dataset for each pose. As I'm new to this field, we will skip this part for now and revisit it later to enhance our solution.
You can check another solution here: Pose Classification
In this post, I will use a simple method for gesture detection.
Each gesture has different angles between the landmarks. For example, when we’re standing, the angle between hip, knee and ankle is close to 180 degrees. Also when we curl our hands, the angle between shoulder, elbow and wrist landmarks is less than 45 degrees.
So, in the first step, we will get all landmarks that we need and calculate the angles between them for further calculation.
First, create a helper function for getting the right landmark.
def get_landmark_coordinates(landmarks, landmark):
value = landmarks[landmark.value]
return [
value.x,
value.y,
value.z,
value.visibility,
]
Then, extract the landmarks from results.
import mediapipe as mp
mp_pose = mp.solutions.pose
pose_landmarks = results.pose_landmarks.landmark
# Get coordinates
left_shoulder = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.LEFT_SHOULDER
)
right_shoulder = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.RIGHT_SHOULDER
)
left_elbow = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.LEFT_ELBOW
)
right_elbow = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.RIGHT_ELBOW
)
left_wrist = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.LEFT_WRIST
)
right_wrist = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.RIGHT_WRIST
)
left_hip = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.LEFT_HIP
)
right_hip = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.RIGHT_HIP
)
left_knee = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.LEFT_KNEE
)
right_knee = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.RIGHT_KNEE
)
left_ankle = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.LEFT_ANKLE
)
right_ankle = get_landmark_coordinates(
pose_landmarks, mp_pose.PoseLandmark.RIGHT_ANKLE
)
...
To calculate the angle between three points in a two-dimensional coordinate system, use the following example code:
import numpy as np
# calculate angle between line ab and bc
def calculate_angle(a, b, c):
a = np.array(a) # First
b = np.array(b) # Mid
c = np.array(c) # End
radians = np.arctan2(c[1] - b[1], c[0] - b[0]) - np.arctan2(a[1] - b[1], a[0] - b[0])
angle = np.abs(radians * 180.0 / np.pi)
if angle > 180.0:
angle = 360 - angle
return angle
We can now use this function to calculate the angles of the body's landmarks.
left_shoulder_angle = calculate_angle(left_elbow, left_shoulder, left_hip)
right_shoulder_angle = calculate_angle(
right_elbow, right_shoulder, right_hip
)
left_elbow_angle = calculate_angle(left_shoulder, left_elbow, left_wrist)
right_elbow_angle = calculate_angle(
right_shoulder, right_elbow, right_wrist
)
left_hip_angle = calculate_angle(left_shoulder, left_hip, left_knee)
right_hip_angle = calculate_angle(right_shoulder, right_hip, right_knee)
left_knee_angle = calculate_angle(left_hip, left_knee, left_ankle)
right_knee_angle = calculate_angle(right_hip, right_knee, right_ankle)
left_hip_knee_angle = calculate_angle(right_hip, left_hip, left_knee)
right_hip_knee_angle = calculate_angle(left_hip, right_hip, right_knee)
We can use these angles to monitor our body's states when we move and emit events like curling our hands or walking, if detected.
We need an object to store the body's current states, such as walking and swinging hands, so that we can predict the current gesture.
Predict body gesture
I will use the squat exercise as an example in this post.
First, create a class called LegState
to manage the state of the body when we receive the angles above.
Set the property squat
= False to keep track of the current state.
If the angle of both the left and right knee are smaller than a certain threshold (I set it to 155 degrees here), we change the property squat
to True
and emit the event. Otherwise, the property is set to False
.
class LegsState:
KNEE_UP_MAX_ANGLE = 155
def __init__(self):
self.squat = False
def update(
self,
left_hip,
right_hip,
left_knee,
right_knee,
left_ankle,
right_ankle,
left_hip_angle,
right_hip_angle,
left_knee_angle,
right_knee_angle,
):
if (
left_knee_angle < self.KNEE_UP_MAX_ANGLE
and right_knee_angle < self.KNEE_UP_MAX_ANGLE
):
if not self.squat:
self.squat = True
print('squat')
# emit event squat here
else:
self.squat = False
Trigger keyboard events
The next step is to create a new class to receive events such as squats, walking,… and turn them into keyboard events.
First, let's create a class called Command
for pressing and releasing the correct key for each event.
In the add_command
function, we pass a command_key_mappings
dictionary to retrieve the key associated with a command. If no key is found, the function terminates.
When a key is pressed, it must be released after a period of 100 milliseconds to 1 second or more. However, the event queue can become overloaded with numerous events in a single second, so we need a way to prevent pressing the same key that is already pressed and to release the key after a certain amount of time.
We will use a timer to separately release the currently pressed key. We also need to remember the most recent pressed key to avoid duplication.
from datetime import datetime
from pynput.keyboard import Controller
from threading import Timer
class CommandProcessor:
def __init__(self):
self.keyboard = Controller()
self.pressing_key = None
self.pressing_timer = None
def release_previous_key(self):
if self.pressing_key:
previous_key = self.pressing_key["key"]
# print(f"releasing {previous_key}")
self.keyboard.release(previous_key)
self.pressing_key = None
def add_command(
self,
command,
keyboard_enabled: bool,
command_key_mappings: dict,
pressing_timer_interval: float,
):
now = datetime.now()
if keyboard_enabled:
if command in command_key_mappings:
key = command_key_mappings[command]
# get current pressing key
previous_key = None
if self.pressing_key:
previous_key = self.pressing_key["key"]
# clear old timer
if self.pressing_timer and self.pressing_timer.is_alive():
# print("cancel timer")
self.pressing_timer.cancel()
# new action
if previous_key != key:
self.release_previous_key()
if key:
print("pressing", key)
self.keyboard.press(key)
if key:
# create new timer
self.pressing_timer = Timer(
pressing_timer_interval,
self.release_previous_key,
)
self.pressing_timer.start()
self.pressing_key = dict(key=key, time=now)
Now, let's create a class called Events
to receive all events from the Command
class above and run the add_command
method.
In the Events
class, we can create multiple Command
properties to press keys simultaneously. We can route the commands to the appropriate processor based on the command name, as shown in the code below.
class Events:
def __init__(
self,
keyboard_enabled,
cross_cmd_enabled,
pressing_timer_interval,
d1_pressing_timer_interval,
d2_pressing_timer_interval,
command_key_mappings,
):
self.keyboard_enabled = keyboard_enabled
self.cross_cmd_enabled = cross_cmd_enabled
self.command_key_mappings = command_key_mappings
self.pressing_timer_interval = pressing_timer_interval
self.d1_pressing_timer_interval = d1_pressing_timer_interval
self.d2_pressing_timer_interval = d2_pressing_timer_interval
self.cmd_process = CommandProcessor()
# process cmd related to direction (left, right)
self.d1_cmd_process = CommandProcessor() # walk
self.d2_cmd_process = CommandProcessor() # tilt face
# Add command to pipeline
def add(self, command):
# Split command by type name
if "walk" in command or "d1" in command:
self.d1_cmd_process.add_command(
command,
self.keyboard_enabled,
self.command_key_mappings,
self.d1_pressing_timer_interval,
)
elif "face" in command or "d2" in command:
self.d2_cmd_process.add_command(
command,
self.keyboard_enabled,
self.command_key_mappings,
self.d2_pressing_timer_interval,
)
else:
self.cmd_process.add_command(
command,
self.keyboard_enabled,
self.command_key_mappings,
self.pressing_timer_interval,
)
The full code of my implementation can be found in the body
directory.
Conclusion
This project is still simple and could be improved more. I would appreciate your opinion or any feedback you may have.
Hope you like it 👋
Posted on February 3, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.