OCR Documents

A Few Words on OCR

OCR or Optical Character Recognition refers to software that converts a scanned text image to readable text. This readable text could then be used in various programs such as a word processor or spreadsheet. If you have documents such as notes, minutes, outlines, etc. then it might be useful to convert the scanned images to OCR text. When using OCR keep these in mind:

  • OCR software will NOT convert all words in your document correctly. The text in the converted document requires proofreading and editing.
  • Documents with poor contrast between the ink and the paper will have more problems with the conversion.
  • When you have a document that you want to scan and convert via OCR be sure to save the document image (non-OCR), then convert the image using OCR, make sure retain BOTH versions of the document. You will probably not have time to proofread and edit the OCR version of the document. In the future if you proofread and edit the OCR document it may contain mistakes that you cannot decipher. When (not IF) this happens you will want to view the scanned version of the document to see what the words should be. You can then edit the OCR document by referring to the scanned version of the document.

Typical OCR documents will have a high rate of errors. OCR software has improved dramatically but it is FAR from perfect. Be very aware of the high error rate and plan accordingly.