Please enable JavaScript to view this site.

PEERNET.ConvertUtility

These options are used when converting to PDF with any of the PEERNET built-in converters installed with Document Conversion Service. These settings do not apply to any other output format.

Caution

This feature is not supported on Microsoft® Windows Server 2008 R2 and Microsoft® Windows 7.

 

Optical Character Recognition (OCR for short), searches for and recognizes text (characters) on scanned pages or images and extracts it as digital text. When recognizing text, the OCR engine has to know which languages to look for on the page. OCR works by analyzing the patterns, shapes, and curves of the text characters on the page and matching them to predefined information for different characters in each language.

OCR will increase the processing time for file conversion. Outside factors such as image quality, the font used, and any image background on the pages will all affect the validity of the OCR results.

They are used by the following converters:

Built-in PDF Converter

Built-in Image Converter

Built-in Cadd Converter

Table values in bold text are the default value for that setting.

 

Conversion Settings -PNBuiltinCaddConverter

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.Enabled

 

Enable this setting to run OCR on your pages and create a searchable PDF file. Running OCR does slow down the conversion process. Does not apply to any other output format.

Values:

String value; 1 to enable OCR, 0 to disable.

 

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.Languages

 

A string representing the languages to recognize on the page. English is preselected by default. You must have at least one language selected for OCR.

Values:

Select multiple languages by separating each langage code with a plus (+) sign. For example, "eng+fra+deu" will recognize English, Franch and German text on your pages.

 

Arabic(ara)

English(eng)

French(fra)

German(deu)

Hebrew(heb)

Hindi(hin)

Italian(ita)

Spanish(spa)

 

Additional languages can be downloaded and added to Document Conversion Service if necessary.

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly

 

Allows you to only run the OCR process on the first page of every file.

Values:

String value; 1 to process only the first page, 0 to process all the pages in the document.

 

Adding Languages

Document Conversion Service comes with files to support recognizing Arabic, English, French, German, Hebrew, Hindi, Italian, and Spanish. You can download additional language files or complete sets of language files from Traineddata Files for Tesseract.

To add them to Document Conversion Service, copy the desired *.traineddata files into the following folder:

%PROGRAMDATA%\PEERNET\Document Conversion Service\tessdata