Reusing millions of records from Europeana

Posted on 05.08.2014 by Sašo Zagoranski

We started Museums.EU with a vision to create a central knowledge base about museums and galleries in Europe. This includes basic information such as locationopening hours and ticket prices but also exhibitionsevents and activities.

Early feedback and a big number of museums that signed up for our FREE CMS indicates that we’ve built an excellent online tool for managing their data.

While the number of museums, exhibition and event records go into tens of thousands, the number of objects from different collections go into millions. Luckily, Europe has made a lot of progress in the field of digitizing collections during the last few years and on the forefront of this movement isEuropeana – an organization dedicated to making cultural heritage and museum collections available online.

Europeana offers an actively developed API that enables access to its’ database of records. Access to an API token, help and documentation is available at Europeana Labs, which was developed as a part of the Europeana Creative project, where Semantika is a partner.

Here is an example of the final integration – an example from Rijksmuseum’s collection:

Step 1: Connecting Museums.EU with Europeana

Our first challenge was to connect the two databases. Since Europeana doesn’t yet offer a numeric unique organizational ID, we’ve linked both databases via string identifiers, specifically – via theedm:dataProvider tag in Europeana’s Data Model. We added an additional “EuropeanaQuery” to each museum in our database, so that we now have a persistent connection between the two websites.

For new records, Museums.EU is setup in such a way that it’s very easy to add this connection but we did have some extra work for the thousands of museums and galleries that were already in the database. We took all the museums in our database and all data providers currently providing records to Europeana and used heuristic matching – matching the name of the museum with the name in the edm:dataProvider field. Entries with a very high matching percentage were linked together automatically, while some were manually reviewed in order to prevent mistakes.

Heuristic matching was done using the same technology that you probably use daily in your word processor – the Damerau–Levenshtein distance. Simply speaking, the distance measures how many changes are needed to change one text into the other, or in the case of museum name matching – how closely the name in our database, matches the data provider in Europeana. When a successful match was found, we created a Europeana query (based on the Solr query rules for Europeana) and stored that for each museum or gallery.

Step 2: Using the API

Once we had the initial connections, it was time to actually start calling the API. The API returns results in JSON format, so decoding the results and understanding the data structure was a necessary first step. We used Europeana Labs’ Console tool, which enables users to enter a search term and then generates a request and response for that particular search term. It’s a useful tool and it makes development easier, especially at the beginning.

Here’s an example of what the Console produces for the search term “Rijksmuseum”:

Museums.EU uses an n-tier architecture and that enabled us to easily integrate an additional “Europeana data and mapping layer”, which handles two tasks: query the API and return the results and mapping, which converts data fields from EDM to our own internal model. This approach ensures that Europeana collections are represented the same as collections that were entered via our Collection Management System. All queries are done at runtime, which means that all data, including images, are served directly from Europeana and the data providers and have not been copied to our own database. This ensures that data on Museums.EU always reflects the latest version on Europeana, as well as ensuring copyright compliance.

Future work

We’re very happy with the work we did so far and would like to thank Europeana and their developers for the help and support we received during development. What you see on the page is only version 1. In the next few weeks, we plan on adding many more enhancements, especially in the presentation layer (e.g. improving the display of collections). We also look forward to further development of Europeana - its’ near term development roadmap includes “Media queries”, which enable much better access to media content such as images.

So as Museums.EU develops and grows, so will our integration with Europeana and the reuse of its’ huge collection of objects from museums and galleries across Europe.