Goobi and UCC set new benchmark for newspaper digitisation at 742 pages per hour
Development work on version 2.0 of the Universal Capturing Client is now complete and introduces some new and surprising features for all users. As well as greatly improving usability, many of the new functions will be extremely handy for those who use the application on a daily basis. We will provide full details when version 2.0 is officially released. At present, it is still undergoing final quality checks based on permanent operation. This involves scanning around 300,000 pages of large-format newspaper volumes, with Goobi handling workflow management, metadata organisation, project coordination and data exchange, while the UCC deals with the actual scanning process and the parallel capture of structure data for the volumes.
Putting our own software through its paces under production conditions has given us a clear idea not only of the improvements users would like to see but also of the tremendous potential that can be realised by making just a few carefully considered changes.
We would like to draw your attention right now to one particular feature of version 2.0 of the UCC that is likely to have easily the biggest impact on your working day. That feature is macrofunctionality.
By employing flexible and individually configurable macros, we have made the simultaneous scanning and indexing of metadata with the UCC much faster still. In terms of newspaper digitisation, our new macro mechanism helped us to set a new speed record for scanning large-format newspaper volumes. To do this, we use a conventional Bookeye 4 V1 from the company Image Access and index the structure data and metadata during the scanning process. The structure data include the issue and different types of supplements. These change approximately every four pages, sometimes starting with new paginations. We also capture metadata during the scanning process, including the issue number, the issue date in both standardised and written form and (depending on the supplement type) the supplement title. Based on the number of volumes processed so far, each year (i.e. each UCC source work or Goobi process) contains around 2,800 pages with approximately 650 structure elements and 1,300 metadata.