Complete OpenPose guide

OpenPose is one of the most popular pose estimation libraries. Its 2D and 3D keypoint detection features are widely used by data science researchers all over the world.

Here is an analysis of its features, application fields, cost for commercial use and alternatives. This should help you decide whether OpenPose is the right choice for your project in artificial intelligence.

What is OpenPose?

OpenPose is a real-time multi-person keypoint detection library for body, face, and hand estimation. It is capable of detecting 135 keypoints.

It is a deep learning-based approach that can infer the 2D location of key body joints (such as elbows, knees, shoulders, and hips), facial landmarks (such as eyes, nose, mouth), and hand keypoints (such as fingertips, wrist, and palm) from RGB images or videos.

The library was created by a group of researchers from Carnegie Melon University and is now maintained by two of its initial creators.

OpenPose is known for its robustness to multi person pose estimation settings and is the winner of the COCO 2016 Keypoints Challenge.

How does OpenPose work?

The initial step of the OpenPose library involves extracting features from an image by utilizing the initial layers.‍

These extracted features are then fed into two separate divisions of convolutional neural network layers. One division is responsible for predicting 18 confidence maps, each representing a specific part of the human pose skeleton.

Simultaneously, the other division predicts a set of 38 Part Affinity Fields (PAFs) that indicate the level of association between different body parts. The subsequent stages are utilized to refine the predictions generated by these divisions.‍

Confidence map assist in constructing bipartite graphs between pairs of body parts, while Affinity Field PAF values help identify and eliminate weaker connections within these bipartite graphs.

By following these steps, it becomes possible to estimate and allocate human pose skeletons to each individual depicted in the image.‍

OpenPose Pipeline Steps

So in summary, OpenPose will do these tasks in sequence:

Initially, the entire image, whether it’s an image or a video frame, is taken as input.
Next, two-branch Convolutional Neural Networks (CNNs) work together to predict confidence maps, which aid in body part detection.
The estimation of Part Affinity Fields (PAFs) comes next, which enables the association of different body parts.
A collection of bipartite matchings is then created to link body part candidates.
Finally, these matched body parts are assembled to form complete full-body poses for all individuals present in the image.

OpenPose features

OpenPose allows computer science professionals across the globe to use a vast selection of features for different computer vision applications.

2D real-time multi-person keypoint detection

2D human pose estimation is one of the most appreciated tasks that OpenPose model can do. Here’s a few frequently used estimations that can be achieved with OpenPose:

15, 18 or 25-keypoint body/foot keypoint estimation, including 6 foot key points. Runtime invariant to the number of detected people.
2x21-keypoint hand key point estimation. Runtime depends on the number of detected people.
70-keypoint face keypoint estimation. Runtime depends on the number of detected people. See OpenPose Training for a runtime invariant alternative.

3D real-time single-person keypoint detection

3D pose estimation is another OpenPose feature that makes this a very powerful library of algorithms.

3D triangulation from multiple single views.
Synchronization of Flir cameras handled.
Compatible with Flir/Point Grey cameras.

Calibration toolbox

Estimation of distortion, intrinsic, and extrinsic camera parameters.

Single-person tracking for further speedup or visual smoothing.

OpenPose input

Input can be image, video, webcam, Flir/Point Grey, IP camera, and support to add your own custom input source (e.g., depth camera). This means you can estimate human movement in real time as well as analyze still images.

OpenPose output

Basic image + keypoint display/saving (PNG, JPG, AVI, …), keypoint saving (JSON, XML, YML, …), keypoints as array class, and support to add your own custom output code (e.g., some fancy UI).

OpenPose can output the keypoints as 2D coordinates, 3D coordinates, or heatmap values, providing flexibility for different applications.

OpenPose OS

Ubuntu (20, 18, 16, 14), Windows (10, 8), Mac OSX, Nvidia TX2.

OpenPose hardware compatibility

CUDA (Nvidia GPU), OpenCL (AMD GPU), and non-GPU (CPU-only) versions.

OpenPose APIs

OpenPose has APIs in several programming languages such as Python, C++, and MATLAB, and can be integrated with other machine learning libraries and frameworks such as TensorFlow, PyTorch, and Caffe.

OpenPose applications

Before we jump into the areas of OpenPose human pose estimation algorithm uses, let’s first take a look at the most important tasks you can do with OpenPose.

Multi-person pose estimation

OpenPose can detect the poses of multiple people in the same image or video stream simultaneously, making it ideal for applications such as action recognition, gesture recognition, and human-computer interaction.

Real-time performance

OpenPose can process images and videos in real-time on modern GPUs, making it suitable for real-time applications such as sports analysis, gaming, and virtual reality.

Accurate keypoint detection

OpenPose can detect key body, face, and hand keypoints with high accuracy, even in challenging scenarios such as occlusion and cluttered backgrounds.

OpenPose has a wide range of applications in various fields. Here are some examples of OpenPose applications in different domains

OpenPose in different industries

Due to its outstanding ability to find and track human poses, OpenPose became a Computer Vision staple in many different industries.

OpenPose for sports Analysis

OpenPose algorithm can be used for many different sports applications, such as injury prevention and gaming.

Human kinetics analysis

Analyzing movements and techniques of athletes to improve their performance in sports like basketball, tennis, and golf.

Injury prevention

Identifying improper posture or movement that could lead to injuries in sports like running, weightlifting, and football.

Gaming

Using motion tracking to control game characters using the player’s body movements, as seen in games like Kinect Sports and Just Dance.

OpenPose for robotics

As you might imagine, OpenPose has multiple applications within the robotics industry.

Human-Robot interaction

Developing robots that can interact with humans using natural body movements, like in personal assistance robots, factory automation, and social robots.

Object manipulation

Controlling robotic arms using hand and finger movements detected by OpenPose, like in manufacturing and assembly line robots.

Gesture recognition

Detecting and recognizing human gestures, like waving, pointing, and hand signals, to control robots, like in home automation and virtual assistants.

OpenPose for healthcare

Healthcare is another area that OpenPose can help with loads of tasks.

Physical therapy

Monitoring patients’ movements during rehabilitation exercises and providing real-time feedback to improve their posture and technique.

Elderly care

Detecting falls and monitoring the activities of elderly people in their homes using OpenPose-based cameras.

Surgery

Providing surgeons with real-time feedback on the positioning and movement of their hands during surgical procedures.

OpenPose for security and surveillance

When it comes to security and surveillance, OpenPose finds many application fields for humans, objects and animals.

Intrusion detection

Detecting and tracking human movements in restricted areas or identifying suspicious activities in real-time.

Crowd monitoring

Analyzing crowd behavior, detecting anomalies, and providing insights for crowd management and public safety.

Perimeter security

Monitoring and analyzing human presence along the perimeter of secure areas, detecting unauthorized entry attempts or potential breaches.

Crowd behavior analysis

Analyzing crowd dynamics, crowd density, and movement patterns in crowded public spaces, assisting in crowd management, event planning, and emergency response.

Traffic surveillance

Tracking and analyzing pedestrian movements at intersections, crosswalks, or public transportation hubs, facilitating traffic management and improving pedestrian safety.

OpenPose for Entertainment

OpenPose is used by the entertainment industry for various applications.

Virtual reality

Tracking body movements to provide an immersive experience in virtual reality environments, like in VR games and simulations.

Animation

Capturing the motion of actors’ bodies and facial expressions to create realistic and expressive animated characters.

Film and TV

Tracking actors’ movements during motion capture sessions and applying them to digital characters in movies and TV shows.

OpenPose for retail and e-commerce

Virtual try-on

Helping customers virtually try on clothes, accessories, or makeup, providing a more personalized and engaging shopping experience.

Customer behavior analysis

Track and analyzing customers’ movements within a store, allowing retailers to optimize store layouts and product placements.

How much does OpenPose cost?

OpenPose is freely available for free non-commercial use, and may be redistributed under these conditions.

The license agreement can be used for academic or non-profit organization noncommercial research only.‍

There is a non-exclusive commercial license. It requires a non-refundable $25,000 USD annual royalty.

Note that the commercial license cannot be used in the field of sports.‍

How to use OpenPose?

The code base is open-sourced on Github and is very well documented.

You can read the official installation documentation.

Install OpenPose

The first step is to install OpenPose on your system. OpenPose is available for various platforms, including Windows, Linux, and macOS.

You can download the latest version of the OpenPose package from the official website.‍

The package includes pre-trained models and configurations that are ready to use, but can also be further customized according to your application needs.‍

Prepare the input data

OpenPose requires input data in the form of images or video streams. The input data can be captured using a camera or loaded from a file.

Preprocessing the data before inputting it into OpenPose is necessary to ensure the best performance and accuracy of the model. This can be done through resizing, cropping, and filtering.

Configure OpenPose

Configuring OpenPose is an essential step in optimizing the model’s performance and accuracy. OpenPose provides various configuration options that can be adjusted.

The configuration options include model type, output format, resolution, and keypoint detection threshold. These options can be selected according to your application’s specific requirements to achieve the best results.

Run OpenPose

Once the input data is prepared and the configuration options are set, OpenPose can be run on the data. OpenPose will analyze the input data and detect the keypoints of the human body, including the position, orientation, and movement of various body parts.

Visualize the output

The final step is to visualize the output of OpenPose. OpenPose provides various output formats, including JSON, XML, and CSV, which can be used to display the detected keypoints in real-time or post-processing analysis The output can be visualized using various tools, such as OpenCV, Matplotlib, or Unity.

OpenPose Alternatives and Comparisons

As powerful as OpenPose is, it’s always worth exploring alternative pose estimation algorithms to determine which is best suited for your use case.‍

Here are a few OpenPose alternatives to consider.

OpenPose vs Mediapipe

Lightweight, cross-platform framework for mobile devices and desktops that enables real-time, high-accuracy hand, facial, and pose tracking.

One of the major advantages of MediaPipe is that it is optimized for mobile devices and can run on resource-constrained devices.

However, it has limited support for 3D pose estimation and requires a significant amount of preprocessing for input data.

OpenPose vs Detectron2

Provides pre-trained models for keypoint detection and pose estimation. Detectron2 is highly customizable and supports a wide range of models, including Mask R-CNN and RetinaNet.

However, it is more complex than other libraries, and its performance may be affected by hardware limitations.‍

OpenPose vs MMPose

A high-accuracy pose estimation framework that includes support for multi-person, 3D, and hand pose estimation. It also includes a variety of pre-trained models and data augmentation techniques for improved performance.

However, it may require more computational resources than some of the other algorithms, and it is currently only available in PyTorch.

OpenPose vs Lightweight-human-pose-estimation.pytorch

PyTorch-based pose estimation algorithm that is designed to be lightweight and fast. It uses a human pose estimation model that has been optimized for running on devices with limited computational resources, such as mobile devices and Raspberry Pi boards.

It can achieve real-time performance, making it suitable for applications such as human-computer interaction and sports analysis.

However, its accuracy may be lower than some of the more complex algorithms.

OpenPose vs Freemocap

Open-source, markerless motion capture system that uses computer vision techniques to estimate the 3D position of a person’s joints from a video stream. It includes support for multi-person pose estimation, as well as body and facial expression recognition.

It can be used for a variety of applications, including animation, gaming, and biomechanics research.

‍However, it may require more computational resources than some of the other algorithms, and its accuracy may be lower in challenging lighting conditions or with occlusions.

OpenPose vs AlphaPose

Offers faster performance than OpenPose and can detect multiple people in a single image or video stream.

However, it may have lower accuracy for small or occluded body parts due to its reliance on bottom-up detection and clustering.

OpenPose vs DeeperCut

Offers higher accuracy than OpenPose, making it a good choice for fine-grained pose estimation and occluded body parts.

However, it is slower than OpenPose due to its reliance on graphical models and requires careful tuning of its hyperparameters.

OpenPose vs HRNet

Boasts state-of-the-art accuracy and fast inference time, making it well-suited for real-time pose estimation and multi-person scenarios.

However, it requires more computational resources than OpenPose due to its use of a deeper network architecture.

OpenPose vs EfficientPose

Offers efficient inference time and improved accuracy compared to other lightweight models, making it ideal for mobile and embedded applications.

However, it may not be as accurate as some of the more complex algorithms due to its lightweight nature.

OpenPose vs DensePose

Can handle more complex poses and motions and estimate detailed body part textures, making it a good choice for fashion and retail applications, virtual try-ons, and gaming and animation.

However, it requires higher quality input images and is only available for non-commercial use due to licensing restrictions.

Compare OpenPose to Other Human Pose Estimation Algorithms

Here is a table with these OpenPose alternatives:

Note: The license type and cost may vary depending on the specific use case and the terms of the license agreement. Please refer to the individual project websites for more information.

Best alternatives to OpenPose for commercial use

If you are planning to create a solution for commercial use requiring multi-person keypoint detection, the Ikomia team advises choosing either Detectron2 or MMPose.

Both of these alternatives are freely available for commercial use under the Apache 2.0 license and are actively maintained by a strong community. You can also discover these resources within the Ikomia HUB and leverage them through either the open-source Ikomia API or Ikomia STUDIO.