Micro Models - How to Build Using This Concept

In my last article (Developing for Scale and Quality with AI Micro-Models), I introduced the concept of micro-models for efficient AI model development. This time, we'll delve deeper into how to find micro-models, how to use them, and the benefits and trade-offs in terms of development time and efficiency. Let's get started!

Defining the Task

In the previous article, we set a fictional goal to "censor people" and broke it down into multiple models and steps. Today, we'll follow a similar approach. Begin by taking the final objective of your model and breaking it down into smaller, manageable tasks. For this example, let's aim to censor people in video clips.

Here's the process we'll follow:

Detect the person.
Crop them.
Apply a blur effect.
Reinsert them back into the original clip. It's evident that using an AI model specifically for detection and handling the rest with code is more efficient than employing a single 'big model' for the entire task.

Searching for the Model

There are numerous sources to find models for development, such as GitHub, arXiv papers, and Hugging Face. I prefer Hugging Face due to its intuitive interface, but any of these platforms should suffice.

Once you've chosen your platform, start searching for models. On Hugging Face, there's a dedicated section for models. I navigated directly to this section and selected the type of model I needed: Object Detection. After that, it's just a matter of choosing a suitable model and learning how to use it.

Applying the Model

Implementing the model was straightforward. I provided the video (or frames of a video) as input. The model output, which included a score and coordinates for cropping the content, was then fed into FFmpeg to handle the blurring of the video frame.

Efficiency and Savings

This approach not only enhances development efficiency but also significantly reduces resource consumption. By leveraging small specialist networks instead of large models, we improve both the speed of development and the execution time of the task.

Size of Micro-Models

There is a trade-off involved. For extremely long and complex tasks, we may need more micro-models, which require more RAM and memory for execution. However, this trade-off results in significantly better execution times and higher precision, as the models are specialized and optimized for their specific tasks.

In summary, micro-models offer a highly efficient way to tackle AI development tasks by breaking down complex problems into smaller, manageable units. This not only speeds up development but also ensures optimal use of resources.

Example

Censor a person in an image

import cv2
import torch

# load our model, in our case, yolo in version 5
model = torch.hub.load('ultralytics/yolov5', 'yolov5x', pretrained=True)

# the model return the boxes and a class for detection, so, we are setting the class to search in landmarks
person_class_id = 0

# function to apply blur
def blur_people(image, boxes):
    for box in boxes:
        x1, y1, x2, y2 = map(int, box)
        image[y1:y2, x1:x2] = cv2.GaussianBlur(image[y1:y2, x1:x2], (51, 51), 50)
    return image

def censor_people_in_image(input_image_path, output_image_path):
    #load the image
    image = cv2.imread(input_image_path)
    if image is None:
        raise FileNotFoundError(f"Não foi possível encontrar ou abrir a imagem {input_image_path}")

    # detect 
    results = model(image)
    boxes = results.xyxy[0].cpu().numpy()  # obter as caixas de detecção

    # filter all person
    person_boxes = [box[:4] for box in boxes if int(box[5]) == person_class_id]

    # apply blur
    censored_image = blur_people(image, person_boxes)
    cv2.imwrite(output_image_path, censored_image)

censor_people_in_image('input.jpg', 'output_image.jpg')

Blog