| PDF Image + Searchable Text Conversion:
(formerly known as PDF plus hidden text) contains a bitmapped
image of the original, and a hidden layer of searchable text.
The conversion process involves: scanning the hardcopy original,
performing OCR (Optical Character Recognition) to capture
the text of the document, and distilling the two layers into
a PDF searchable image file. Though text can be searched,
hyperlinks and bookmarks are not fully functional in this
format. As with PDF image only, PDF searchable image files
are only as legible as the original. And PDF searchable image
files have the largest file size of the three types - this
can be a big issue if the PDF document is bound for the Internet.
Pages will be displayed as image resulting
in accuracy which is inherently high based on image displayed. Text resulting from an OCR (Optical Character
Recognition) process may be “bonded” to the originating
image to create a PDF/Searchable Image file. When you search
for words or phrases, they will be highlighted in the image.
This background text allows search ability,
but the accuracy is dependent on the quality of your originals
and other factors. Based on this background text, you have
two options:
- PDF Image + Text (Raw or uncorrected
OCR text)
- PDF Image + Text (Corrected or proof-read)
For many applications, the raw conversion
with uncorrected text is accurate enough. For clients needing
higher accuracy rates, Suntec will correct and proofread the
OCR output. This process is often vital for documents containing
italicized characters and small text, or for poor-quality
original documents.
PDF/Searchable Image files may be indexed
for full-text retrieval by any search engine capable of indexing
PDF files.
Typical applications include
- business records
- academic journals
- advertising and promotional materials
- historical materials and
- handwritten materials including color
or grayscale images.
PDF/Searchable Image is used globally by
governments and businesses for electronic storage and retrieval
of:
- Business Records
- CD-ROM publishing
- Electronic Publishing
- Manufacturing and design documentation
- On-line content / Intranet content
- Records Retention / Legacy Data Conversion
- Delivery Challans, Shipping notes,
and Invoices
|