Introduction to data management planning Joy Davidson Digital Curation Centre Acknowledgements: content contributed by Sarah Jones, Jonathan Rans Funded by: Definition of research data ‘Research data’ refers to information, in particular facts or numbers, collected to be examined and considered as a basis for reasoning, discussion or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. The focus is on research data that is available in digital form. Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 v.1.0, 11 December 2013, Footnote 5, p3 How does research data fit in with the theme of open science? “science carried out and communicated in a manner which allows others to contribute, collaborate and add to the research effort, with all kinds of data, results and protocols made freely available at different stages of the research process.” Research Information Network, Open Science case studies www.rin.ac.uk/our-work/data-management-and-curation/ open-science-case-studies Levels of open data ⭐ make your stuff available on the Web (whatever format) under an open licence ⭐⭐ make it available as structured data (e.g. Excel instead of a scan of a table) ⭐⭐⭐ use non-proprietary formats (e.g. CSV insteadof Excel) ⭐⭐⭐⭐ use URIs to denote things, so that people can point at your stuff ⭐⭐⭐⭐⭐ link your data to other data to provide context Tim Berners-Lee’s proposal for five star open data - http://5stardata.info “Open data and content can be freely used, mod ified and shared by anyone for any purpose” http://opendefinition.org How does research data management fit into the picture? Create Document Use Store Share Preserve • Data Management Planning • Creating data • Documenting data • Accessing / using data • Storage and backup • Selecting what to keep • Sharing data • Data licensing and citation • Preserving data Create Document Use Store Share Preserve Create Document Use Store Share Preserve Funders have expectations about data sharing… “The European Commission’s vision is that information already paid for by the public purse should not be paid for again each time it is accessed or used, and that it should benefit European companies and citizens to the full.” http://ec.europa.eu/research/participants/data/ ref/h2020/grants_manual/hi /oa_pilot/h2020-hi-oa-pilot-guide_en.pdf Data management plans requested for those participating in Open Data pilot. “Data sets are becoming the new instruments of science” Dan Atkins, University of Michigan …but RDM is part of good research practice! DMPs can help Projects participating in the pilot will be required to develop a Data Management plan (DMP), in which they will specify what data will be open. Note that the Commission does NOT require applicants to sub mit a DMP at the proposal stage. A DMP is therefore NOT part of the evaluation. DMPs are a deliverable for those participating in the pilot. What aspects of RDM should be in a DMP? § What data will be created (format, types, volume...) § Standards and methodologies to be used (incl. metadata) § How ethics and Intellectual Property will be addressed § Plans for data sharing and access § Strategy for long-term preservation Create Document Use Store Share Preserve A DMP is a plan to share! How will you name your files? • Keep it simple! • Agree methods with partners • Include dates • Avoid non-alphanumeric characters • Use hyphens or underscores not spaces e.g. day-sheet, day_sheet • Order the elements logically Example from ARM Climate Research Facility www.arm.g ov/data/docs/plan www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name What is metadata? What is the difference? • Metadata – Standardised – Structured – Machine and human readable Metadata Documentation How should you describe your data? http://www.dcc.ac.uk/resources/metadata-standards What is the minimum required? • DataCite metadata used by OpenAIRE • Citation/disambiguation – Identifier e.g. DOI – Creator – Title – Publisher – Publication Year • Licencing/access conditions Where will you store the data during your research? • Your own laptop • University systems • Cloud storage • Combination Your decision will be based on how sensitive your data are, how robust you need the storage to be, who needs access to the data, and when they need access to the data! Which data must be kept? • Data, including associated metadata, needed to validate the results in scientific publications • Other curated and/or raw data, including associated metadata, as specified in the DMP Doesn’t apply to all data (researchers to define as appropriate) Don’t have to share data if inappropriate – exemptions apply Exemptions – reasons for opting out • If results are expected to be commercially or industrially exploited • If participation is incompatible with the need for confidentiality in connection with security issues • Incompatible with existing rules on the protection of personal data • Would jeopardise the achievement of the main aim of the action • If the project will not generate / collect any research data • • If there are other legitimate reasons to not take part in the Pilot Can opt out at proposal stage OR during lifetime of project Should describe issues in the project Data Management Plan Which additional data might be kept after the project ends? Five steps to follow ① Could this data be re-used ② Must it be kept as evidence or for legal reasons ③ Should it be kept for its potential value ④ Consider costs – do benefits outweigh cost? ⑤ Evaluate criteria to decide what to keep 5 steps to decide what data to keep www.dcc.ac.uk/resources/how-guides/five-steps-decide-what-data-keep Assign persistent identifiers • They are an alphanumeric code identifying a resource, organisation or individual • They must be – Unique – Persistent • Ideally they should be actionable too https://ssi-dev.epcc.ed.ac.uk/ Remember to consider physical data, software and models http://spatialinformationdesignlab.org/proje ct_sites/library/catalog.html http://www.ukcrcexpmed.org.uk/Coventry_Warwick_CRF/Publish ingImages/Tissue%20Bank%201.jpg Can your data be shared with others? • PI/researcher • Data repository and support staff • Research participants • Commercial partners • Secondary data user How will it be shared? http://service.re3data.org/search Zenodo • Joint effort by OpenAIRE-CERN • Multidisciplinary repository • Multiple data types • Citable data (DOI) • Links funding, publications, data & software www.zenodo.org • Does your publisher or funder suggest a repository? • Are there data centres or community databases for your discipline? • Does your university offer support for long-term preservation? www.dcc.ac.uk/resources/how-guides/license-research-data Licensing research data This DCC guide outlines the pros and cons of each approach and gives practical advi ce on how to implement your licence CREATIVE COMMONS LIMITATIONS NC Non-Commercial What counts as commercial? ND No Derivatives Severely restricts use These clauses are not open licenses Horizon 2020 Open Access guidelines point to: or EUDAT licensing tool Answer questions to determine which licence(s) are appropriate to use http://ufal.github.io/lindat-license-selector Options for open data • Domain repository • General repository – Figshare, Zenodo, Dryad • Institutional repository • Journal supplementary material • Departmental web page Ø General directories Re3data.org Ø Domain specific directories e.g. life sciences – Biosharing.org Ø Data journal recommendations Edinburgh research data blog: Sources of dataset peer review Ø Funding body recommendations E.g. Wellcome Trust Data repositories and database sources Finding external repositories Go Considerations • There may be an accepted repository used by peers or required by funders • Multidisciplinary studies may not have an obvious home • Data types and volumes will impact on decision How will you make your data discoverable? http://ckan.data.alpha.jisc.ac.uk/datasethttps://www.researchfish.com/ http://gtr.rcuk.ac.uk/ http://researchdata.gla.ac.uk/ Options for closed data • Institutional data archive/vault • Safe havens – (e.g. secure patient data) • 3rd party data archiving • Cloud storage • Institutional servers – the ‘do nothing’ option Approach: as open as possible, as closed as necessary Image: ‘Balancing rocks’ by Viewminder CC-BY-SA -ND www.flickr.com/photos/light_seeker/778085722 4 Refer to free guides and briefing papers www.dcc.ac.uk/resources/ Guidelines from the Commission • Factsheet on Open Access – https://ec.europa.eu/programmes/horizon2020/sites/horizon2020/files/FactSheet_Ope n_Access.pdf • Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 – http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_ pilot/h2020-hi-oa-pilot-guide_en.pdf • Guidelines on Data Management in Horizon 2020 – http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_ pilot/h2020-hi-oa-data-mgt_en.pdf https://dmponline.dcc.ac.uk Make use of free tools What is DMPonline? • A web-based tool to help researchers write Data Management and Sharing Plans • Includes requirements and guidance from funders, universities and other groups • Developed by the Digital Curation Centre • More visible research outputs and increased impact - even for negative results • Easier outputs reporting • Better and more reproducible research! Good RDM helps you comply with mandates but also leads to… Thanks for listening! joy.davidson@glasgow.ac.uk www.dcc.ac.uk Follow us on twitter: @jd162a @digitalcuration and #ukdcc