Intranda’s OCR service includes a number of components that provide full text recognition of text-based materials. The service is fully integrated into Goobi. The OCR process is fully automated and is carried out in the background within the workflow. You can also use the complete OCR service without Goobi. This section contains more details about intranda’s OCR service and the options available to you.
Integrating the OCR service into Goobi
The OCR service can be fully integrated into Goobi. This involves configuring the relevant workflow for the step in question so that information about the desired OCR results can be passed to the service. This includes:
- the image set being analysed (master images, derivatives, etc.)
- the font (Antiqua, Fracture)
- the language of the material
- the target format for the OCR results (PDF, TXT, DOC, TEI, ALTO, XML etc.)
- the priority
These parameters need only be set once for the workflow or when you add new volumes to Goobi. Thereafter, the OCR service will run automatically in the background when the configured status is reached within the workflow.
Integrating the OCR service into other applications
The task of integrating the OCR service into any other application is straightforward. Full text recognition can be performed from any application by calling a web service or by calls via a platform-independent command line. You can therefore submit large volumes of images from any application for automated OCR batch processing.
Please contact us if you have any questions..
Once individual OCR orders have been generated by the application into which the facility has been integrated, they join the OCR service queue. This is a web-based management program. Very occasionally, you may wish to check the processing sequence or view the volume during processing. If so, this can be managed via any web browser.