Simple DETR Object Detection with Python
Christopher
Posted on June 13, 2024
DETR (DEtection TRansformer) is a deep learning model designed for object detection. It utilizes the Transformer architecture, initially created for natural language processing (NLP) tasks, as its core element to tackle the object detection challenge in an innovative and highly efficient way.
Prerequisites
I’d assume you have a background in programming with python. If not it should be installed on your computer before continuing.
If you need to download Python, you can visit the official Python downloads page.
Create your virtual environment
Create a virtual environment in python so you can run your packages separate from your host’s environment
python -m venv myenv
Activate virtual environment
Windows
myenv\Scripts\activate
Mac
source myenv/bin/activate
Install packages
We will need to install a few packages before we get started.
pip install transformers torch Pillow requests
Next, create an /images
folder in the root of your project. This is where you will save your images to test your AI solution. Im using .jpg files from www.unsplash.com.
After saving an image into the /images directory, we can now start to write the code that will find our image and pass it into the Image.open()
method.
import os
from transformers import DetrImageProcessor, DetrForObjectDetection
import torch
from PIL import Image
print("transformers", DetrImageProcessor)
current_dir = os.path.dirname(os.path.abspath(__file__))
images_dir = os.path.abspath(os.path.join(current_dir, 'images'))
print("Root directory:", images_dir)
image_path = os.path.join(images_dir, 'airplane.jpg') #
print("image path:", image_path)
print("Reading images from /images")
image = Image.open(image_path)
print("Processing image...")
Once this runs with no errors, we can confidently add the rest of our solution which will scan and provide the results of our image detection.
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
box = [round(i, 2) for i in box.tolist()]
print(
f"Detected {model.config.id2label[label.item()]} with confidence "
f"{round(score.item(), 3)} at location {box}"
)
After running server.py, you should get an output similar to this. The decimal numbers you see after location
are the coordinates of the area in your image that your model detected the object at.
Reading images from /images
Processing image...
Detected bird with confidence 0.992 at location [55.82, 32.17, 225.04, 225.28]
Potential business value
Models like this can provide a lot of value to software services and products people interact with daily.
Image detection models can detect things like cancer in clinical trials, assist autonomous vehicles with identifying red light and emergency signals or even prevent unauthorized access to systems and physical resources by detecting the identity of a user.
Possibilities are endless.
Posted on June 13, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.