How to Extract Email & Phone Number from a Business Card Using Python, OpenCV, and TesseractOCR

abhiwalia15

Mrinal Walia

Posted on May 22, 2020

How to Extract Email & Phone Number from a Business Card Using Python, OpenCV, and TesseractOCR

In this blog post, you will learn how to extract email and phone number from a business card and save the output in a JSON file.

Side Note: You can try out with this new course on data visualizations, by Datacamp Web Scraping in Python, Introduction to Matplotlib in Python and Exploratory Data Analysis in Python which helped me a lot in starting my journey into Web Scraping, or you can take up this course on Image Processing and Computer Vision if you have good experience in Python.

Building the email and phone number extractor with OpenCV & TesseractOCR can be done by following five easy steps :

  • Step 1: We will start by detecting the edges of the document we want to scan.

  • Step 2: Using these edges, find the contour(outline) representing the piece of the document being scanned.

  • Step 3: Apply a perspective transform to obtain the top-down view of the document.

  • Step 4: Using pytesseract to extract text from the scanned image.

  • Step 5: Apply regex to identify only the email and phone number in the extracted text and save the output.
  • Here is the link to the article: https://datascienceplus.com/how-to-extract-email-phone-number-from-a-business-card-using-python-opencv-and-tesseractocr/

    You can download the source code to this blog post here: My Github repository.

    Follow me LinkedIn: Mrinal Walia
    Follow me on Github: Mrinal Walia

    💖 💪 🙅 🚩
    abhiwalia15
    Mrinal Walia

    Posted on May 22, 2020

    Join Our Newsletter. No Spam, Only the good stuff.

    Sign up to receive the latest update from our blog.

    Related