Use Paperwork to digitize and archive documents

Yuriy Klochan, 123RF

From issue 33 / 2017

Paper Trail

Karsten Günther

Digital archives do away with the need for traditional filing cabinet storage. Even so, Paperwork tries to make your life easier.

The idea behind Paperwork originated with the desire to have a paperless office. Letters, bills, and loose pages are loaded into a scanner, which spits out PDF and JPEG files into the in tray. Then, the contents of these files are converted with OCR into digital form.

This is where Paperwork [1] comes into play. The application collects image data and text, overlaps them, and then saves them as a PDF. Paperwork creates a summary of the text content for the prepared documents as a searchable index. However, there are some pitfalls inherent to this process for which you should watch out. The scans and photographs will need to have the highest possible resolution so that the software is able to properly recognize the text. This means that a good scanner with at least 600 DPI resolution is a must.

During start up, Paperwork first looks for Tesseract [2]. If it cannot find this powerful OCR engine, then the program will use CuneiForm. In most cases, you will get the best results with Tesseract. You can install it with:

[...]

Use Express-Checkout link below to read the full article (PDF).