Building Real-time Object Detection on Live-streams

spring93

Spring-0

Posted on November 30, 2024

Building Real-time Object Detection on Live-streams

Artificial Intelligence (AI), or more specifically object detection is a fascinating topic that opens a gateway to a wide variety of projects and ideas. I recently came across YOLO (You Only Look Once) from Ultralytics, which is an fast, accurate, and super easy to implement object detection model. In this post I will walk you through my process of building real-time object detection on live streams.

I have built this to work on RTSP (Real-time streaming protocol) and HLS (HTTP Live Streaming).

What Makes YOLO Special?

For starters, YOLO is super fast and excels with real time object detection due to the way it works internally which is very different from other models such as R-CNN.

Algorithms like Faster R-CNN use Region Proposal Network to detect regions of interest, then performs detection on those regions over multiple iterations. While YOLO does it in a single iteration, hence the name "You Only Look Once".

In addition, YOLO requires very little training data because the first 20 convolution layers have been pre-trained on the ImageNet dataset.

Now The Project!

Now that all the terminology is out of the way, I can dive into how I set up real-time object detection using YOLO.

First step is to install the required dependencies:

  • torch (only needed if you plan on utilizing GPU for training)
  • opencv-python for video processing
  • ultralytics for the YOLO model

Since I have a Nvidia graphics card I utilized CUDA to train on my GPU (which is much faster).

First we load the YOLO model, I used YOLOv11 trained on the COCO (Common Objects in Context) dataset.

model = YOLO(r"C:\path\to\your\yolo_model.pt")

Next, we capture the stream using opencv-python, read each frame within a loop, and run that frame through our YOLO model - very straight forward.

video_cap = cv2.VideoCapture(STREAM_URL)
cv2.namedWindow("Detection Output", cv2.WINDOW_NORMAL)

while True:
    ret, frame = video_cap.read() # read the frame from the capture
    if not ret:
        break

    results = model(frame) # get prediction on frame from YOLO model

    cv2.imshow("Detection Output", frame) # Draw the frame

    if cv2.waitKey(1) == ord("q"): # Quit on "q" key press.
        break

# Don't forget to quit gracefully!
video_cap.release()
cv2.destroyAllWindows()
Enter fullscreen mode Exit fullscreen mode

That easy, now this will give you the predictions in your terminal, but what if you want to draw your bounding boxes for example?

results = model(frame) - results represents a list of predictions YOLO has made, and each of these predictions have additional data; such as bounding box coordinates, confidence, and labels.

With this you can loop through the results list, and draw whatever data you want to display from the predictions to your frame.

Here is an example where I drew bounding boxes around the predictions:

    for box in results[0].boxes.xywh.tolist():
        center_x, center_y, width, height = box
        x1 = int(center_x - width / 2)  # top left x
        y1 = int(center_y - height / 2)  # top left y
        x2 = int(center_x + width / 2)  # bottom right x
        y2 = int(center_y + height / 2)  # bottom right y

        # rectangle parameters: frame, point1, point2, BGR color, thickness
        cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
Enter fullscreen mode Exit fullscreen mode

You can find the full code on my GitHub here.

Demo

For this demo, I used yt-dlp to get the direct stream URL from a YouTube livestream like so:

yt-dlp -g https://www.youtube.com/watch?v=VIDEO_ID

With the following detection classes:

  • person
  • bicycle
  • car
  • motorcycle
  • bus
  • truck
  • cat
  • dog
  • sports ball

I purposely omitted labels and confidence scores to reduce clutter

And that's that. Thanks for Reading :)

🌱

💖 💪 🙅 🚩
spring93
Spring-0

Posted on November 30, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related