Documentation for the Goobi plugin LayoutWizzard released
Surely you have already heard about our developments in the context of page recognition in some events of the last months and have already had the opportunity to take a look at the LayoutWizard interface. Since we could observe an increasing interest in this development during the last months, we took the opportunity to create a detailed documentation of this still quite new Goobi-Pluign and publish it here.
In case you didn’t know anything about LayoutWizzard yet, here’s a short summary of what it’s actually used for:
Generous scanning including black borders
In our opinion, master digital copies should be scanned with a generous frame around the individual pages. In this way, one always gets digital copies that contain everything relevant in any case and nothing was accidentally removed due to wrong frame setting or automatic cropping.
Automatic Image Analysis
An automatic image analysis of the LayoutWizard checks all digitized images and determines the actual page based on this and straightens the page – based on the analysis of the entire page, not just the printed text on it. The position of the book fold is then determined depending on the information as to whether the page is a left or right page.
Visual inspection and correction of analysis results
Following the automatic image analysis, a Goobi user is given the option of checking the determined values in the layout wizard. In the event of a serious deviation from the average of adjacent pages, or because the analysis was not certain of the correctness of some recognitions, the user has the option of intervening manually. Thus, a user confirms or corrects the analysis results before they are applied. Since this check is carried out separately in a list display of several left and right pages, even such a check is extremely efficient.
Automatic cropping of images as own derivative
Finally, i.e. after a user has confirmed or corrected the analysis results, this information is used for the actual process of processing. As in the case of analysis, this process takes place within the TaskManager as an independent plugin. The determined and confirmed values of the analysis are now used to cut out the actual page from the existing master digitized data (including generous frames) and save it as an independent derivative in a different directory.
Gecropptes Endergebnis neben den Masterbildern
The end result for the Goobi user is therefore both the original master folder with the generously scanned digitised images, including the black frame, book fold, etc. as well as a separate folder with the cropped individual pages, on which pages that have just been moved have no black borders and also no part of the opposite page beyond the book fold.
With the derivative, which LayoutWizzard has created in addition to the master digitalization, it is now possible to work much better in the subsequent steps. In addition to the smaller file size, the corrected alignment and the lower toner consumption in the case of printing, these images can also be used for significantly better text recognition (OCR) and the generated e-book, e.g. as an Epub file, achieves a higher quality. Last but not least: In addition to the cropped version of the images, your long-term archive also contains the generously scanned master digital copy. Safe is safe
If you would like to learn more about LayoutWizzard, just have a look at its documentation. This can be found at the following address: