, ,

Receipt Reader Thingy

4/01/2020

There really isn’t any cool title for the project. I’m trying to use OCR to read a receipt and store the text on it in a spreadsheet, organized by keywords such as text, etc. Obviously it’s just the implementation of putting the text onto the spreadsheet that I’ll really be working on. There are lots of tutorials online about using things like Python to have an OCR reader to display everything that IS text in an image as a text file, it’s just the added step of sorting through everything that’s not quite perfected.

 

My plan is to get an OCR script running so it will scan a specified image and save it as a text file (basic step one), and have it open another script that will start the sorting process. I don’t know if it will be easier to have each type of sorting that happens be a new script, or if I should first work on making it all function as one large Frankensteined thing. In the end, I don’t think that will be my biggest worry. My biggest worry, most definitely, will be trying to figure out how to install all the dependencies.

 

I’ve installed Tesseract, I’ve installed OpenCV, I’ve installed Matplotlib and Numpy, but even then the tutorial scripts give me errors. I’ve tried it in two of I think at least four versions of Python so it could just be a dependency issue. I’ll update the project as I go, starting at the top so it will start with a beautiful diagram of how it actually works, and then devolve into how I was confused on how to get it started. Well…. here’s to hoping.