Call AmDoc Today 800-455-1599

Email*
Schedule a Demo
Schedule a Demo"See it to understand it."  Seeing is believing -- and comprehending.  A few minutes watching technology saves hours of reviewing brochures and reading manuals.   Schedule your demo now!  Read More
Document Imaging
RSS Feed Available

Text Conversion

Optical Character Recognition (OCR)

Images (TIF, PDF or other formats) are only pictures – a graphical representation of the original document.  This is the process of converting an image (a collection of small dots called “pixels” that comprise a large image) into searchable text. 

Image formats are not inherently searchable.

When utilized with a robust search engine, OCR'd text allows users to find “any word on any document” very quickly.  A portion of documents may require conversion to a full-text format. 

Following imaging of the documents indicated, AmDoc utilizes high-speed optical character recognition technology to convert images to text. The quality and condition of the original document greatly impacts the quality of the OCR process. If needed, the text can be “cleaned-up” to restore the original appearance of the document.  

During the process of text conversion, also known as Optical Character Recognition (OCR), the computer system changes each character into machine readable text. This process is dependant upon the clarity of the scanned image. If the original document was degraded, such as a photocopy of a photocopy of a fax (known as “third generation photocopy”), the clarity of the first scanned image will not be good.

When the characters of the original document are crisp, the text conversion is good. Any time the characters are broken or touching, such as happens in multi-generation photocopies, extremely small fonts or poor quality text (such as from phone books or newspapers), the text conversion may not be 100% accurate.
 

AmDoc will convert all "typewritten" documents to computer-readable text for inclusion in the online retrieval system. This will allow users to find information utilizing either the coded data or the full text. Issues and key words can be located in a more proficient, cost-effective manner. 

OCR technology accuracy depends entirely upon the condition of the original document.  Better accuracy is available for clean, first generation original documents with standard non-serifed fonts.  Less accuracy occurs when multi-generational photocopies or old fax documents are used, when the fonts are broken or touching, or when the paper color is not white.  AmDoc makes available a clean-up process that will examine and correct inaccurate text.  However, modern retrieval systems usually have the ability to find documents even if they have been misspelled.  AmDoc's recommendation is to test the retrieval prior to doing any text clean-up.  If the important information can be found quickly, further clean-up is generally not needed.