Tiny Programs 1: docshund-rs
Connor
Posted on March 25, 2022
Long story short, I've wound up starting work on a small Tesseract OCR program. I call it docshund-rs
, because it finds things in documents like a dachshund finds gophers in holes, and it's written in Rust. I'm intensely creative.
It took me longer to remember how Rust does Result<> type returns and accordingly unwrap the results of the tesseract-rs
calls than it did to get the program working.
Though, all things told, it's already pretty cool. It can successfully scan image files like JPEG, PNG and TIF with a reasonable degree of accuracy.
Ultimately I think docshund-rs
will be a program that can take a PDF file, turn it into images, and then process a bunch of those pages concurrently before barfing the output back out into a searchable PDF, or at least just a text file dump.
This is also subject to my interest level in the project, which usually varies wildly.
Though I think I'll keep a running tab of Tiny Programs and link it all together as a series, regardless.
Title photo by James Watson on Unsplash
Posted on March 25, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.