Open data are data that are freely accessible, without subscription or payment wall. Those who publish open data indicate via a licence whether and how the data may be reused by others.
Openness of research data is the starting point of the global Open Science movement. More and more scientists realise that the results of publicly funded research, including the underlying data, should be made available to a wide audience to promote transparency, reuse and interdisciplinarity.
Many government organisations at home and abroad publish some of their data openly, for the sake of transparency, reuse and accountability.
In the heritage sector, too, there have been initiatives to make collection data openly available to the public since 2010. This also applies to the Library.
Linked open data (LOD)
There are several ways to capture structured data. The most common way is through tables. In addition, linked data has come into vogue in recent decades. Capturing as linked data ensures that the data is connected and can be queried via semantic searches. Because linked data builds on standard web technology such as HTTP, RDF and URIs, it is also automatically machine-readable. Another strength of linked data is that the meaning is embedded in the data itself, whereas tabular data contains derived information that machines cannot naturally recognise and interpret as such.
The LOD movement is growing rapidly, also increasing the possibilities of linking and enriching existing information and improving the search experience. An example of a huge LOD collection is Wikidata.
Linked Open Data fits perfectly with the FAIR principles we have embraced as a university library:
- Data becomes discoverable by using unique identifiers (URIs) for each resource, which can be easily discovered by search engines and other tools.
- Data becomes accessible by providing open and standardised interfaces (e.g. Sparql for data querying).
- Data become interoperable by using standardised ontologies and vocabularies that allow data to be easily integrated with other datasets.
- Data becomes reusable by providing detailed and machine-readable metadata describing the content, structure and context of the data.
LOD applications usually have a search function, which works using the Sparql search language. That language allows you to search one or more datasets at a time. To use linked data sets for research, you will first have to learn this search language. Many tutorials are available online. For instance, check out data.world or The Carpentries.
You can make linked data available via a continuous stream. They are then accessible via an API. This is useful if the data are subject to constant change. You can also download a linked open dataset in its entirety for further analysis.
Linked Open Data is a basis for automated search and artificial intelligence. The virtual question beacons Siri (Apple), Alexa (Amazon) and Google Assistant derive their knowledge from it.
Open collection data
The data at the UvA collections (including the Allard Pierson) are available in these formats and through various publishing channels such as downloads, OAI harvesting, APIs, Linked Data Endpoints and other tools, with these licences.
In addition, individual records are available through CataloguePlus in various formats (raw MARC, various citation and reference manager/RIS formats).
Making metadata accessible as linked open data is a labour-intensive job that we realise in steps.
Open data provided by UvA researchers
By 2022, the Library, together with the UvA CREATE Lab, has made three existing datasets available as Linked Open Data: Cinema Context, Onstage and Ecartico.
- Cinema Context is an online encyclopaedia of film (culture) in the Netherlands from 1896 onwards. It is a database of historical and contemporary film theatres in the Netherlands, and the more than 100,000 film screenings since then. It provides insight into the DNA of Dutch film and cinema culture and part of the cultural life of the Netherlands.
- How the data are modelled and what you can then do with the data (via example queries) can be found at https://uvacreate.gitlab.io/cinema-context/cinema-context-rdf/.
- Onstage contains all performances, revenues and cultural programmes in Amsterdam's public theatres in the 17th to 18th centuries. In it, you will find from all theatre programmes, the plays performed per day, names of actors and translators to even the revenues through time.
- Ecartico is a database with biographical information of people involved in the cultural sector in Amsterdam in the 16th-18th centuries: artists, painters, actors, venue owners, etc. You can do more than just search and browse for data on people or make selections from certain types of data. With Ecartico, you can also visualise and analyse data on cultural entrepreneurs and their 'environments'.
No single generic licence applies to this data. You can find which licence applies to the relevant dataset.