Persistent Identifiers for Research Data in Europe
All research disciplines are confronted with an enormous increase of the number of data objects they are dealing with and of the complexity of the relationships amongst them. Objects are organized in virtual collections which are increasingly often defined by the needs of the analyzing researcher, i.e. individual or sets of objects are grouped together in almost arbitrary ways where metadata is used to store all information. Semantic weaving often applied in particular in the humanities and social sciences is not only relating objects but fragments of objects, in electronic publications references to fragments of objects are being used to proof claims and interoperability between objects is increasingly often relying on referencing to commonly used objects. There are many more examples for the increasing relevance of references to document the results of research work and often they are being created by (semi) automatic means.
Just as we starting to understand that we need to improve our efforts to preserve digital research data we need to conceive that references between data objects and their elements are part of the research data infrastructure which need to be preserved as well to make data interpretable in future. References are known in the research world for a long time already for example to cite research results in publications. New, however, is the shear mass of references and their granularity which we will have to manage. References occur at various levels for many different purposes within and across digital archives – thus in contrast of former referencing systems we do not speak about hundreds of references to be managed, but about millions of references. Mostly references will not be used by human inspection, but simply as part of automatic procedures, i.e. highly performant and highly available resolving systems are required to offer satisfying services.
Therefore, the Max Planck Society decided to set up a reliable system for creating, resolving and storing persistent identifiers for all its researchers. Therefore, research infrastructures such as CLARIN understand the necessity of providing a system that can be used by all its centres. Therefore, projects in the domain of digital cultural heritage see the need to provide a robust system for their institutions.
To meet these forth-coming requirements from various communities within the research and cultural heritage domain for a robust, performant and available service for registering and resolving persistent identifiers an initiative has been started to share the burden of offering a persistent identifier system and therefore to offer also a higher chance for persistency. The participating institutions commit themselves to offer a joint and redundant system the business model of which will be defined by the research communities. This service based on the Handle System will offer the required performance and robustness. It is not seen as competitive to other offers, but as an additional one.
The participating institutions declare to be willing to work out an appropriate sustainable service, operating and business model which will extend the service already given now by the GWDG for the Max Planck Society. It will offer interested communities to participate in these discussions about the principles of a shared and therefore highly available and highly persistent service. In the first year ePIC will work on a prototype solution for such a robust system with the intention to turn this into a full production service.
It will take care in discussions with CNRI to find a proper basis for the smooth continuation of the Handle System and to establish the required independence. Other well-known institutions are welcome to participate in setting up and maintaining this shared persistent identifiers system in Europe.
The participating institutes will together with other stakeholders take part in and support founding an international governing board guiding further operation and development of the Handle System. The purpose of this is safeguarding the investments of the scientific community in using the Handle System for research data.
GWDG, SARA (now SURFsara), CSC, DKRZ