How to Enhance Passport MRZ Detection in Python by Correcting Image Orientation

Passport Machine Readable Zone (MRZ) detection is sensitive to the orientation of the passport. If the passport is not right side up, the MRZ detection rate will be low. In this article, we will discuss how to improve the MRZ detection rate from rotated images with Python. Edge detection, perspective transformation and face detection will be used to correct the orientation of the passport.

Installation

pip install mrz-scanner-sdk document-scanner-sdk dlib mediapipe retina-face opencv-python

mrz-scanner-sdk: Dynamsoft MRZ Scanner SDK for MRZ detection. A valid license key is required to use the SDK. You can get a free trial license from here.
document-scanner-sdk: Dynamsoft Document Scanner SDK for edge detection and perspective transformation.
dlib: An open-source software library that provides highly accurate and efficient face detection algorithm.
mediapipe: A Google-developed, open-source, cross-platform framework designed for rapid, real-time face detection.
retina-face: A deep learning based cutting-edge facial detector for Python coming with facial landmarks.
opencv-python: Used to display images and draw lines.

Passport Edge Detection and Perspective Transformation

Let's get started with a passport image taken in the correct orientation.

Using the following Python code can successfully detect the MRZ area:

import argparse
import mrzscanner
import sys
import numpy as np

def scanmrz():
    parser = argparse.ArgumentParser(description='Scan MRZ info from a given image')
    parser.add_argument('filename')
    parser.add_argument('-l', '--license', default='', type=str, help='Set a valid license key')
    args = parser.parse_args()
    try:
        filename = args.filename
        license = args.license

        # set license
        if  license == '':
            mrzscanner.initLicense("LICENSE-KEY")
        else:
            mrzscanner.initLicense(license)

        scanner = mrzscanner.createInstance()
        scanner.loadModel(mrzscanner.load_settings())

        import cv2
        image = cv2.imread(filename)
        results = scanner.decodeMat(image)
        for result in results:
            print(result.text)
            s += result.text + '\n'
            x1 = result.x1
            y1 = result.y1
            x2 = result.x2
            y2 = result.y2
            x3 = result.x3
            y3 = result.y3
            x4 = result.x4
            y4 = result.y4

            cv2.drawContours(image, [np.int0([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2)
            cv2.putText(image, result.text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 0, 255), 2)

        cv2.imshow("MRZ Detection", image)
        cv2.waitKey(0)

    except Exception as err:
        print(err)
        sys.exit(1)

if __name__ == "__main__":
    scanmrz()

If the image is rotated at a significant angle, MRZ detection may fail.

To address this issue, we can use edge detection and perspective transformation to correct the orientation of the passport.

Here are the steps:

Initialize the document scanner:

import docscanner
doc_scanner = docscanner.createInstance()
doc_scanner.setParameters(docscanner.Templates.color)

Detect the edges of the passport:

results = doc_scanner.detectMat(image)
result = results[0]
x1 = result.x1
y1 = result.y1
x2 = result.x2
y2 = result.y2
x3 = result.x3
y3 = result.y3
x4 = result.x4
y4 = result.y4

Rectify the passport image:

rectified_document = doc_scanner.normalizeBuffer(image, x1, y1, x2, y2, x3, y3, x4, y4)
rectified_document = docscanner.convertNormalizedImage2Mat(rectified_document)

Detect the MRZ area from the rectified passport image:

def detect_mrz(image, scanner):
    s = ""
    results = scanner.decodeMat(image)
    for result in results:
        # print(result.text)
        s += result.text + '\n'
        x1 = result.x1
        y1 = result.y1
        x2 = result.x2
        y2 = result.y2
        x3 = result.x3
        y3 = result.y3
        x4 = result.x4
        y4 = result.y4

        cv2.drawContours(image, [np.int0([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2)
        cv2.putText(image, result.text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (0, 0, 255), 2)

    cv2.imshow("MRZ Detection", image)

detect_mrz(rectified_document, mrz_scanner)

Rotating Images Based on Facial Orientation

After perspective transformation, the image may be oriented in one of four directions: 0 degrees, 90 degrees, 180 degrees, or 270 degrees.

If you run the code above, you will find only the 0-degree orientation allows for normal MRZ detection. Thus, we aim to rotate the other three orientations to this correct angle. Considering that the orientation of the face on the passport is consistent with that of the Machine-Readable Zone, we can use face detection to rotate the image accordingly.

Numerous face detection algorithms exist, each with varying levels of performance. In this article, we will compare the effectiveness of three prominent algorithms: Dlib, MediaPipe, and RetinaFace.

Dlib

Download the pre-trained model from here.
Unzip the file and put it in the same folder as the Python script.

Create the Dlib face detector:

import dlib
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

Detect the faces from the rectified passport image:

mg = cv2.imread(filename)  
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

start_time = time.time()
faces = detector(gray)
end_time = time.time()
print("Elapsed Time:", end_time - start_time)

The dlib face detection algorithm is typically trained on datasets where faces are upright or near-upright. The features learned by the classifier assume that the faces in the images will be oriented in a specific way, usually right side up. When a face is rotated significantly (like upside-down or tilted at 90 degrees), the learned features may not match well, making it difficult for the algorithm to detect the face.

Mediapipe

Download the pre-trained model from here. At present, only BlazeFace (short-range) is available, which is a lightweight model for detecting single or multiple faces.
Put the model in the same folder as the Python script.

Create the MediaPipe face detector:

import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils

base_options = python.BaseOptions(model_asset_path='blaze_face_short_range.tflite')
options = vision.FaceDetectorOptions(base_options=base_options)
detector = vision.FaceDetector.create_from_options(options)

Detect the faces from the rectified passport image:

img = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
image = mp.Image(image_format=mp.ImageFormat.SRGB, data=img)

start_time = time.time()
detection_result = detector.detect(image)
end_time = time.time()
print("Elapsed Time:", end_time - start_time)

Compared to Dlib, Mediapipe is faster and more accurate. However, it still falls short of our requirements because it fails to detect some facial landmarks correctly.

Retinaface

RetinaFace is a deep learning-based face detection model aimed at identifying faces in images with high accuracy. Let's explore whether it meets our objectives.

from retinaface import RetinaFace
img = cv2.imread(filename)
obj = RetinaFace.detect_faces(img_path=img)

if type(obj) == dict:
    for key in obj:
        identity = obj[key]

        facial_area = identity["facial_area"]
        facial_img = img[facial_area[1]: facial_area[3],
                            facial_area[0]: facial_area[2]]

        landmarks = identity["landmarks"]
        left_eye = landmarks["left_eye"]
        right_eye = landmarks["right_eye"]
        nose = landmarks["nose"]
        mouth_right = landmarks["mouth_right"]
        mouth_left = landmarks["mouth_left"]

        cv2.rectangle(img, (facial_area[0], facial_area[1]),
                        (facial_area[2], facial_area[3]), (0, 255, 0), 2)
        cv2.circle(img, (int(left_eye[0]), int(
            left_eye[1])), 2, (255, 0, 0), 2)
        cv2.circle(img, (int(right_eye[0]), int(
            right_eye[1])), 2, (0, 0, 255), 2)
        cv2.circle(img, (int(nose[0]), int(nose[1])), 2, (0, 255, 0), 2)
        cv2.circle(img, (int(mouth_left[0]), int(
            mouth_left[1])), 2, (0, 155, 255), 2)
        cv2.circle(img, (int(mouth_right[0]), int(
            mouth_right[1])), 2, (0, 155, 255), 2)

cv2.imshow(filename, img)

RetinaFace takes the longest time for face detection, but it is the most accurate. It correctly identifies facial landmarks in all four directions, which we can use to rotate the image.

def rotate(img, left_eye, right_eye, nose):

    nose_x, nose_y = nose
    left_eye_x, left_eye_y = left_eye
    right_eye_x, right_eye_y = right_eye

    if (nose_y > left_eye_y) and (nose_y > right_eye_y):
        return img # no need to rotate
    elif (nose_y < left_eye_y) and (nose_y < right_eye_y):
        return cv2.flip(img, flipCode=-1) # 180 degrees
    elif (nose_x < left_eye_x) and (nose_x < right_eye_x):
        transposed = cv2.transpose(img)
        return cv2.flip(transposed, flipCode=0) # 90 degrees 
    else:
        transposed = cv2.transpose(img)
        return cv2.flip(transposed, flipCode=1) # 270 degrees

Combining Document Detection and Retina Face Detection for MRZ Detection

We can now combine the above steps to detect the MRZ area in rotated passport images.

import argparse
import mrzscanner
import sys
import numpy as np
import cv2
import docscanner
import time
import face_retina

def detect_mrz(image, scanner):
    ...

def detect_doc(image, scanner):
    ...

    return mat

def scanmrz():
    parser = argparse.ArgumentParser(description='Scan MRZ info from a given image')
    parser.add_argument('filename')
    parser.add_argument('-l', '--license', default='', type=str, help='Set a valid license key')
    args = parser.parse_args()
    try:
        filename = args.filename
        license = args.license

        # set license
        if  license == '':
            defaultLicense = "LICENSE-KEY"
            mrzscanner.initLicense(defaultLicense)
            docscanner.initLicense(defaultLicense)
        else:
            mrzscanner.initLicense(license)
            docscanner.initLicense(license)

        mrz_scanner = mrzscanner.createInstance()
        mrz_scanner.loadModel(mrzscanner.load_settings())
        doc_scanner = docscanner.createInstance()
        doc_scanner.setParameters(docscanner.Templates.color)

        image = cv2.imread(filename)
        copy = image.copy()
        copy = detect_doc(copy, doc_scanner)
        copy = face_retina.detect(copy)
        detect_mrz(copy, mrz_scanner)

        cv2.imshow("Original", image)


    except Exception as err:
        print(err)
        sys.exit(1)

if __name__ == "__main__":
    scanmrz()
    cv2.waitKey(0)

Source Code

https://github.com/yushulx/python-mrz-scanner-sdk/tree/main/examples/enhanced

Blog