Please enable JavaScript to view this site.

Document Conversion Service 3.0

Navigation: Conversion Settings

Built-in Converter OCR Options

Scroll Prev Top Next More

These options are used when converting to PDF with any of the PEERNET built-in converters installed with Document Conversion Service. These settings do not apply to any other output format.

Caution

This feature is not supported on Microsoft® Windows Server 2008 R2 and Microsoft® Windows 7.

 

Optical Character Recognition (OCR for short), searches for and recognizes text (characters) on scanned pages or images and extracts it as digital text. When recognizing text, the OCR engine has to know which languages to look for on the page. OCR works by analyzing the patterns, shapes, and curves of the text characters on the page and matching them to predefined information for different characters in each language.

OCR will increase the processing time for file conversion. Outside factors such as image quality, the font used, and any image background on the pages will all affect the validity of the OCR results.

They are used by the following converters:

Built-in PDF Converter

Built-in Image Converter

Built-in Cadd Converter

Table values in bold text are the default value for that setting.

 

Sample Profile

 

<?xml version="1.0" encoding="utf-8"?>

<Profile Type="0"

        DisplayName="Adobe PDF OCR Searchable"

        Description ="Converts to OCR (searchable) PDF.">

<Settings>

 

  <add Name ="ConverterPlugIn.PNBuiltinsOCRPDF.Enabled" Value="1"/>

  <add Name ="ConverterPlugIn.PNBuiltinsOCRPDF.Languages" Value="eng+fra"/>

  <add Name ="ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly" Value="0"/>

 

  <!-- Output file options -->

  <add Name="Devmode settings;Resolution" Value="300"/>

  <add Name="Save;Output File Format" Value="Adobe PDF Multipaged"/>

   ...

   

</Settings>

</Profile>

 

 

Code Sample - C#

 
PNDocConvQueueServiceLib.PNDocConvQueueItem item = null;
 
// Create the conversion item
item = new PNDocConvQueueServiceLib.PNDocConvQueueItem();
 
// Set conversion settings

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Enabled", "1");

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Languages", "eng+fra");

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly", "0");

 

item.Set("Devmode settings;Resolution", "300");
item.Set("Save;Output File Format", "Adobe PDF Multipaged");
...
// convert the file

item.Convert("Cadd - Builtin", _

            @"C:\Test\BuildingPlan.dwf", _

            @"C:\Test\Out\ConvertedDrawings");

 

Code Sample - VB.NET

 

Dim item As PNDocConvQueueServiceLib.IPNDocConvQueueItem

 

' Create the conversion item

item = New PNDocConvQueueServiceLib.PNDocConvQueueItem()

 

' Set conversion settings

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Enabled", "1")

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.Languages", "eng+fra")

item.Set("ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly", "0")

 

item.Set("Devmode settings;Resolution", "300")

item.Set("Save;Output File Format", "Adobe PDF Multipaged")

...

' convert the file

item.Convert("Cadd - Builtin", _

            "C:\Test\BuildingPlan.dwf", _

            "C:\Test\Out\ConvertedDrawings")

 

 

Conversion Settings -PNBuiltinCaddConverter

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.Enabled

 

Enable this setting to run OCR on your pages and create a searchable PDF file. Running OCR does slow down the conversion process. Does not apply to any other output format.

Values:

String value; 1 to enable OCR, 0 to disable.

 

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.Languages

 

A string representing the languages to recognize on the page. English is preselected by default. You must have at least one language selected for OCR.

Values:

Select multiple languages by separating each langage code with a plus (+) sign. For example, "eng+fra+deu" will recognize English, Franch and German text on your pages.

 

Arabic(ara)

English(eng)

French(fra)

German(deu)

Hebrew(heb)

Hindi(hin)

Italian(ita)

Spanish(spa)

 

Additional languages can be downloaded and added to Document Conversion Service if necessary.

Name:

ConverterPlugIn.PNBuiltinsOCRPDF.FirstPageOnly

 

Allows you to only run the OCR process on the first page of every file.

Values:

String value; 1 to process only the first page, 0 to process all the pages in the document.

 

Adding Languages

Document Conversion Service comes with files to support recognizing Arabic, English, French, German, Hebrew, Hindi, Italian, and Spanish. You can download additional language files or complete sets of language files from Traineddata Files for Tesseract.

To add them to Document Conversion Service, copy the desired *.traineddata files into the following folder:

%PROGRAMDATA%\PEERNET\Document Conversion Service\tessdata