Making research data repositories visible and discoverable Robert Ulrich – Karlsruhe Institute of Technology Outline • Background • Mission • Project • Impact • Sustainability • Lessons learned & best practices Background • Research data are valuable and ubiquitous • New technologies facilitate data-intensive science • Broad discussion about the permanent access to research data • Increasing requirements from funders to make research data openly available • Growing demand for trustable and sustainable research data repositories • Trend: data journals Background Funding organizations: Data Policies Example: european commission European Commisson. (2014). Horizon 2020 Annotated Model Grant Agreements. Version 1.6.2 .Retrieved from http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/amga/ h2020-amga_en.pdf Background Journals: Data Policies Example: Nature Publishing Group “[...] authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications. “ Example: PLOS “PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.“ Background Generic range of research data (with examples) intrinsic findings observations, measurement data technical documentation, method descriptions Geo wis sen sch afte n CE RN -Ex per ime nte quantum physics software mathematicssociological research geo scie nce ex per ime nts at CER N Background • Research data are of most varied nature. • Research data can only imperfectly be treated by an information management like conventional information/library media. • Research data repositories (RDRs) often represent an essential stage of compression, abstraction and summary of research data, authorized and authenticated by the producers. • RDRs can be operated centrally (institutional RDRs) or locally (disciplinary RDRs). • In particular local or disciplinary RDRs are very popular in science because they represent a kind of a bottom up approach in research data management by the research groups themselves. Research Data Repositories • Highly heterogeneous landscape of research data repositories • Different communities and different approaches • EC (2009): ICT infrastructures for e-science „The landscape of data repositories across Europe is fairly heterogeneous, but there is a solid basis to develop a coherent strategy to overcome the fragmentation and enable research communities to better manage, use, share and preserve data.“ The RDR Landscape funders journals research data repositories RRZE Icon Set (CC: BY-SA)scientists universities and research labs The RDR Landscape funders scientists journals universities and research labs research data repositories RRZE Icon Set (CC: BY-SA) Investigators are expected to share their data! Underlying data must be accessible! Should we offer repositories for all disciplines?Where can I find data and store mine? How can we set up repositories? Research Data Repository PANGAEA, http://www.pangaea.de GEO, http://www.ncbi.nlm.nih.gov/geo/ • Example: Disciplinary repositories Research Data Repository • Example: Institutional repositories Open Data LMU, http://data.ub.uni-muenchen.de/ PURR, http://research.hub.purdue.edu Research Data Repository • Example: Project focused repositories BDPP, http://www.digitalpantheon.ch SDDB, http://www.scientificdrilling.org Research Data Repository • Example: Generic repositories Figshare, http://figshare.com Zenodo, http://zenodo.org Publishing strategies • As independent information object • As document within a reviewed “data paper” • As addition to a reviewed article doi:XX.XXXX/XXX.XX doi:XX.XXXX/XXX.XX doi:XX.XXXX/XXX.XX doi:XX.XXXX/XXX.XX doi:XX.XXXX/XXX.XX RRZE Icon Set (CC: BY-SA) Mission • is a global registry of research data repositories • covers research data repositories from all academic disciplines • helps researchers, funding bodies, publishers and scholarly institutions to find research data repositories • aims to promote a culture of sharing, increased access and better visibility of research data Schema • general information (e.g. short description of the RDR, content types, keywords) • responsibilities (e.g. institutions responsible for funding, content or technical issues) • policies (e.g. policies of the RDR, incl. there URL) • legal aspects (e.g. licenses of the database and datasets) • technical standards (e.g. APIs, versioning of datasets, software of the RDR) • quality standards (e.g. certificates, audit processes) Icons The research data repository provides additional information on its service. The research data repository provides open/restricted/closed access to its data. The terms of use and licenses of the data are provided by the research data repository. The research data repository provides a policy. The research data repository uses a persistent identifier system to make its provided data persistent, unique and citable. The research data repository is either certified or supports a repository standard. RESEARCH DATA REPOSITORY GENERAL INFORMATION POLICY LEGAL ASPECTS TECHNICAL STANDARDS QUALITY STANDARDS Quality Requirements • Run by a legal entity, such as a sustainable institution (e.g. library, university) • Clarify access conditions to the data and repository as well as the terms of use • Have focus on research data • (Have an english graphical user interface) Workflow Interface search filtersl resultsl iconsi Interface Interface Interface Interface Guidelines “Datasets are more likely to be seen, reused, and have impact if they can be found where potential reusers are likely to look. If you are unsure where that might be, the Registry of Research Data Repositories (re3data.org) provides a list of repositories organised by subject, content type and country.” Alex Ball (DCC) and Monica Duke (DCC) - How to Measure the Impact of Research Data / A Digital Curation Centre ‘working level’ guide Guidelines Scientific Data (NPG) „Physics, astrophysics, astronomy and geoscience databases should be registered with re3data.org.“ European commission Growth Four dimensions of sustainability Technology LegalFinance Organisation Sustainability - Technology • Open interfaces RESTful API OpenSearch Documentation: http://www.re3data.org/api/doc Various usage scenarios, e.g. by OpenAIRE • Open metadata Documentation: http://www.re3data.org/schema/ Sustainability - Legal • Open licenses • CC BY for web page content • CC 0 for metadata Sustainability - Partners • Berlin School of Library and Information Science • GFZ German Research Centre for Geosciences • Karlsruhe Institute of Technology (KIT), KIT Library • Purdue University, Purdue Libraries • German Research Foundation • Institute of Museum and Library Services (IMLS) Funded by Sustainability – Cooperations • German Initiative for Network Information (DINI) • DataCite (MoU, April 2012) • OpenAIRE (MoU, October 2013) • BioSharing (MoU, November 2013) • Databib (MoU, March2014) • RDA Sustainability - Join forces Databib and re3data.org have agreed to the following five principles for successful cooperation: • Openness • Optimal quality assurance • Development of innovative functionalities • Shared leadership • Sustainability Sustainability - Organization ➔ Databib and re3data.org merged in spring 2015 ➔ Will become a service of DataCite Sustainability - Finance ● Finance for hosting and maintenance by DataCite ● Future managment by DataCite ● May be further third party funding Lessons learned • Openness as paradigm works well (“open science” & schema development) • Cooperation is worth the effort (Databib & DataCite are reliable partners) • Quality assurance by international editorial board Raise awareness • Among service units & researchers • Teach students & young scientists • Not during lectures • But during projects & thesis's Clarify responsibilities Responsibility: Correctness of Data Responsible: Scientist Responsibility: Data management Responsible: Service units Thank you for your attention! info@re3data.org http://re3data.org With the exception of all photos and graphics, this slides are licensed under the “Attribution 4.0 International (CC BY 4.0)“ Licence: http://creativecommons.org/licenses/by/4.0/