Document Conversion Service 3.0

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Print Topic
Share This Topic
Save Permalink URL

Built-in Converter OCR Options

These options are used when converting to PDF with any of the PEERNET built-in converters installed with Document Conversion Service. These settings do not apply to any other output format.

	Caution
This feature is not supported on Microsoft® Windows Server 2008 R2 and Microsoft® Windows 7.

Optical Character Recognition (OCR for short), searches for and recognizes text (characters) on scanned pages or images and extracts it as digital text. When recognizing text, the OCR engine has to know which languages to look for on the page. OCR works by analyzing the patterns, shapes, and curves of the text characters on the page and matching them to predefined information for different characters in each language.

OCR will increase the processing time for file conversion. Outside factors such as image quality, the font used, and any image background on the pages will all affect the validity of the OCR results.

They are used by the following converters:

•Built-in PDF Converter

•Built-in Image Converter

•Built-in Cadd Converter

Table values in bold text are the default value for that setting.

Sample Profile

<?xml version="1.0" encoding="utf-8"?>

<Profile Type="0"

DisplayName="Adobe PDF OCR Searchable"

Description ="Converts to OCR (searchable) PDF.">

...

</Settings>

</Profile>

Code Sample - C#

PNDocConvQueueServiceLib.PNDocConvQueueItem item = null;

// Create the conversion item
item = new PNDocConvQueueServiceLib.PNDocConvQueueItem();

// Set conversion settings

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Enabled", "1");

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Languages", "eng+fra");

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly", "0");

item.Set("Devmode settings;Resolution", "300");
item.Set("Save;Output File Format", "Adobe PDF Multipaged");
...
// convert the file

item.Convert("Cadd - Builtin", _

@"C:\Test\BuildingPlan.dwf", _

@"C:\Test\Out\ConvertedDrawings");

Code Sample - VB.NET

Dim item As PNDocConvQueueServiceLib.IPNDocConvQueueItem

' Create the conversion item

item = New PNDocConvQueueServiceLib.PNDocConvQueueItem()

' Set conversion settings

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Enabled", "1")

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Languages", "eng+fra")

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly", "0")

item.Set("Devmode settings;Resolution", "300")

item.Set("Save;Output File Format", "Adobe PDF Multipaged")

...

' convert the file

item.Convert("Cadd - Builtin", _

"C:\Test\BuildingPlan.dwf", _

"C:\Test\Out\ConvertedDrawings")

Conversion Settings -PNBuiltinCaddConverter

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.Enabled

Enable this setting to run OCR on your pages and create a searchable PDF file. Running OCR does slow down the conversion process. Does not apply to any other output format.

Values:

String value; 1 to enable OCR, 0 to disable.

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.Languages

A string representing the languages to recognize on the page. English is preselected by default. You must have at least one language selected for OCR.

Values:

Select multiple languages by separating each langage code with a plus (+) sign. For example, "eng+fra+deu" will recognize English, Franch and German text on your pages.

•Arabic(ara)

•English(eng)

•French(fra)

•German(deu)

•Hebrew(heb)

•Hindi(hin)

•Italian(ita)

•Spanish(spa)

Additional languages can be downloaded and added to Document Conversion Service if necessary.

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly

Allows you to only run the OCR process on the first page of every file.

Values:

String value; 1 to process only the first page, 0 to process all the pages in the document.

Adding Languages

Document Conversion Service comes with files to support recognizing Arabic, English, French, German, Hebrew, Hindi, Italian, and Spanish. You can download additional language files or complete sets of language files from Traineddata Files for Tesseract.

To add them to Document Conversion Service, copy the desired *.traineddata files into the following folder:

%PROGRAMDATA%\PEERNET\Document Conversion Service\tessdata

Please enable JavaScript to view this site.

Document Conversion Service 3.0

Adding Languages