public abstract class AbstractTesseract4OcrEngine extends Object implements IOcrEngine, IThreadLocalMetaInfoAware
IOcrEngine.
This class provides possibilities to perform OCR, to read data from input
files and to return contained text in the required format.
Also there are possibilities to use features of "tesseract"
(optical character recognition engine for various operating systems).| Constructor and Description |
|---|
AbstractTesseract4OcrEngine(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties) |
| Modifier and Type | Method and Description |
|---|---|
void |
createTxtFile(List<File> inputImages,
File txtFile)
Performs OCR using provided
IOcrEngine for the given list of
input images and saves output to a text file using provided path. |
Map<Integer,List<TextInfo>> |
doImageOcr(File input)
Reads data from the provided input image file and returns retrieved
data in the format described below.
|
String |
doImageOcr(File input,
OutputFormat outputFormat)
Reads data from the provided input image file and returns retrieved
data as string.
|
void |
doTesseractOcr(File inputImage,
File outputFile,
OutputFormat outputFormat)
Performs tesseract OCR for the first (or for the only) image page.
|
String |
getLanguagesAsString()
Gets list of languages concatenated with "+" symbol to a string
in format required by tesseract.
|
Tesseract4OcrEngineProperties |
getTesseract4OcrEngineProperties()
Gets properties for
AbstractTesseract4OcrEngine. |
com.itextpdf.kernel.counter.event.IMetaInfo |
getThreadLocalMetaInfo() |
String |
identifyOsType()
Identifies type of current OS and return it (win, linux).
|
boolean |
isWindows()
Checks current os type.
|
void |
setTesseract4OcrEngineProperties(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
Sets properties for
AbstractTesseract4OcrEngine. |
IThreadLocalMetaInfoAware |
setThreadLocalMetaInfo(com.itextpdf.kernel.counter.event.IMetaInfo metaInfo) |
void |
validateLanguages(List<String> languagesList)
Validates list of provided languages and
checks if they all exist in given tess data directory.
|
public AbstractTesseract4OcrEngine(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
public void doTesseractOcr(File inputImage, File outputFile, OutputFormat outputFormat)
inputImage - input image FileoutputFile - output file for the result for the first pageoutputFormat - selected OutputFormat for tesseractpublic void createTxtFile(List<File> inputImages, File txtFile)
IOcrEngine for the given list of
input images and saves output to a text file using provided path.createTxtFile in interface IOcrEngineinputImages - List of images to be OCRedtxtFile - file to be createdpublic final Tesseract4OcrEngineProperties getTesseract4OcrEngineProperties()
AbstractTesseract4OcrEngine.Tesseract4OcrEnginePropertiespublic final void setTesseract4OcrEngineProperties(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
AbstractTesseract4OcrEngine.tesseract4OcrEngineProperties - set of properties
Tesseract4OcrEngineProperties for AbstractTesseract4OcrEnginepublic final String getLanguagesAsString()
String of concatenated languagespublic final Map<Integer,List<TextInfo>> doImageOcr(File input)
doImageOcr in interface IOcrEngineinput - input image FileMap where key is Integer
representing the number of the page and value is
List of TextInfo elements where each
TextInfo element contains a word or a line and its 4
coordinates(bbox)public final String doImageOcr(File input, OutputFormat outputFormat)
input - input image FileoutputFormat - return OutputFormat resultString that is
returned after processing the given imagepublic boolean isWindows()
public String identifyOsType()
Stringpublic void validateLanguages(List<String> languagesList) throws Tesseract4OcrException
languagesList - List of provided languagesTesseract4OcrException - if tess data wasn't found for one of the
languages from the provided listpublic com.itextpdf.kernel.counter.event.IMetaInfo getThreadLocalMetaInfo()
getThreadLocalMetaInfo in interface IThreadLocalMetaInfoAwarepublic IThreadLocalMetaInfoAware setThreadLocalMetaInfo(com.itextpdf.kernel.counter.event.IMetaInfo metaInfo)
setThreadLocalMetaInfo in interface IThreadLocalMetaInfoAwareCopyright © 1998–2021 iText Group NV. All rights reserved.