Overview of the First intranda Goobi Day
intranda’s first-ever Goobi Day took place on 1 October 2014 in Göttingen. Invitations were sent out to all groups with an interest in Goobi. The event was very popular, attracting 56 participants, each with a different background, from 29 organisations.
The event required a lot of preparation. intranda’s conference room was specially extended by twenty square metres. Without this extension, we could not have accommodated the large number of visitors who were expected to attend.
Shortly before the event, once the main conversion work was done, we turned our attention to all the other detailed arrangements – installing projectors and screens and displaying new pictures. We also had to set up a large pool of scanners so that we could demonstrate how they linked to Goobi.
The event was designed to cover the entire field of digitisation, and of course the scanning hardware plays a crucial role in determining the outcome of all subsequent digitisation activities. We were particularly grateful to the companies Image Access and SMA for lending us a number of machines.
Although the official opening and welcome were scheduled for 9.30, the first participants started to arrive from 9.00. They spread out over both floors, filled the wardrobes and began to generate a lively atmosphere. By shortly after 10.00, nearly all the participants had arrived, so it was time to get started. intranda’s managing director and software developer Steffen Hankiewicz formally welcomed participants and opened the event.
After noting that there would be yet another last-minute addition to the event in the form of an extra presentation, he explained how the day would be organised, gave details of the WLAN and the catering arrangements and expressed his wish that everyone would be actively involved.
From tool to trend – Workflow systems as enablers
One of our guest speakers was intranda co-founder Markus Enders from the British Library. He began his talk with a brief review of the situation when Goobi was first developed back in 2004. Many of the participants will no doubt have been somewhat surprised albeit familiar with the way the methodology evolved, using proprietary files to capture workflow status and metadata.
Markus Enders continued with a detailed analysis of current issues, challenges and development plans in the context of various international digitisation projects. As well as Premis, JPEG 2000 and ALTO, he focused particularly on the latest developments in the International Image Interoperability Framework IIIF, which has now been introduced into numerous international institutions as a way of dealing with the issue of persistent addressability for individual pages and page areas.
OCR, what now?
The next presentation was given by Oliver Paetzel, a software developer at intranda, who talked about the latest status in the development of named entity recognition, currently being implemented by intranda. He explained the main principles behind named entities and how they can be recognised in running text. Using specific examples, he looked at how they can be used to generate additional sources of information from the OCR results already produced in many locations with the help of Goobi and the intranda TaskManager. To clarify how this methodology works, he showed how collaborative training can be provided on the basis of ALTO 2.1 files using the intranda NEAT software solution with a view to refining the recognition algorithms for the named entities. Building on these training results, he then explained how named entity recognition can be fully automated and performed directly from Goobi.
Oliver Paetzel drew attention to the benefits of this method, noting that the additional allocation of identifiers from norm databases to the previously recognised entities created new research options. Using the GND as an example, he gave a live demonstration of the information that can be obtained with linked open data and how they can be made available to users in combination with the named entities.
AREDO – The German National Library’s cooperation model for long-term digital archiving
Another guest speaker was Karlheinz Schmitt, who gave a presentation on AREDO, the long-term digital archiving service provided by the German National Library. He outlined both the technical and organisational aspects of the service, stressed the importance to the Library of cooperation with other institutions and analysed the benefits and synergies created for all concerned as a result of that cooperation.
Drawing on practical and therefore easily understandable examples, he explained the various challenges that need to be overcome in the field of long-term digital archiving. During a lively discussion with the audience, he laid out very clearly the complexities involved and how the German National Library was dealing with them.
Teaching and learning digitisation skills – Questions and answers on the training required for all aspects of digitisation projects and for Goobi.
After a short coffee break, it fell to Jan Vonde (system administrator at intranda) to report on the need for training to support digitisation projects. He talked about previous Goobi training courses, what was actually taught and what they generally cover. At the same time, he stressed the consistent growth in demand for training over recent years. With the introduction of new functions in Goobi and the need to deal with additional subjects such as the intranda viewer and the intranda TaskManager, it is clear that the amount of time dedicated to training was barely sufficient to provide adequate training in each area. Furthermore, with digitisation projects now extending over a longer period and most institutions having to deal with staff turnover, there is a constant demand for training in different fields.
Jan Vonde outlined intranda’s plans to respond to the growing demand for training and how the courses would be structured. Courses will be held in the new training room with its complement of scanning hardware. This generated a lively discussion among participants, who offered some very useful feedback on which subjects needed to be covered and on how courses should be organised.
Embracing the plugin – Goobi’s modular structure, practical applications and an overview
The last presentation before lunch was given by Steffen Hankiewicz, who talked about the technical background to Goobi’s modular structure. He began by explaining the difference between a monolithic and a modular infrastructure, how a plugin interface basically works and what specific plugin interfaces have already been developed for use with Goobi. Using real examples of productive workflows from a range of digitisation projects at various institutions, he demonstrated how flexible Goobi’s existing modular components already are and how simple it is to extend Goobi using plugins that allow users to employ different methodologies.
Steffen Hankiewicz also looked at intranda’s approach to plugin development and discussed the need for a clearer overview, greater transparency and more effective documentation.
In response to a question on the ideal shape of a platform to provide information on newly developed functions, he then went on to present the new Goobi Marketplace, which already contains transparent lists, explanations and documentation for between 110 and 140 plugins, modules and other tools. Some of the documentation still needs to be revised, so the Goobi Marketplace will probably not make its official debut with all its initial content until 1 November 2014. intranda will operate the site in partnership with (initially) four commercial and two public organisations. This will help to ensure that the contents are consistently updated and that new Goobi plugins and tools are regularly added in a transparent manner.
Before the lunch break, there was a live demonstration to give the audience an insight into the current version of the Goobi Marketplace. This was accompanied by an explanation of the different areas of the site and of the opportunities for interaction between developers and users.
Subject-based discussion groups to facilitate the exchange of information and ideas between digitisation professionals – users, developers and those working in support roles
After lunch, during which participants took the opportunity to pick up on the various themes covered in the morning session, the afternoon began with an invitation to participants to join one or more discussion groups and to ask and answer as many questions as they wished for the next hour and a half. The discussion groups covered the following subjects:
- workflow, named entity recognition, norm data, OCR and long-term archiving
- scanning, photography, image generation, scanning hardware
- metadata, import plugins, data formats, exporting, interfaces and mapping
- administration, support, maintenance, monitoring, infrastructure
The resulting exchange was extremely wide-ranging and lively. Users from very different backgrounds formed discussion groups to share ideas, offer advice and answer questions with reference to their individual methods and experiences.
Overall review of subject-based discussions and plenary session
When it came to the last item on the formal agenda, all participants were gathered together once again in the presentation room. The aim of this session was to review the main subject-based discussions as a whole group while allowing for further detailed questions. This again resulted in a lively exchange between all participants and included a discussion of best practice.
The event was not only successful but exceeded all expectations thanks to a lively exchange between users throughout the day, a series of highly informative specialist presentations and the support provided by intranda’s software developers, system administrators and members of the support team.
We would like to thank all speakers and participants and our partners for their cooperation and for the very positive feedback we have received. We are already looking forward to the next Goobi Day.