Introduction to data management planning Joy Davidson Digital Curation Centre Acknowledgements: content contributed by Sarah Jones, Jonathan Rans Funded by: Digital Curation Centre Definition of research data ͚‘esearch data͛ refers to iŶforŵatioŶ, iŶ particular facts or numbers, collected to be examined and considered as a basis for reasoning, discussion or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. The focus is on research data that is available in digital form. Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 v.1.0, 11 December 2013, Footnote 5, p3 How does research data fit in with the theme of open science? ͞sĐieŶĐe Đarried out aŶd ĐoŵŵuŶiĐated iŶ a ŵaŶŶer ǁhiĐh allows others to contribute, collaborate and add to the research effort, with all kinds of data, results and protocols made freely available at different stages of the research proĐess.͟ Research Information Network, Open Science case studies www.rin.ac.uk/our-work/data-management-and-curation/ open-science-case-studies Levels of open data ฀ make your stuff available on the Web (whatever format) under an open licence ฀฀ make it available as structured data (e.g. Excel instead of a scan of a table) ฀฀฀ use non-proprietary formats (e.g. CSV instead of Excel) ฀฀฀฀ use URIs to denote things, so that people can point at your stuff ฀฀฀฀฀ link your data to other data to provide context Tim Berners-Lee’s proposal for five star open data - http://5stardata.info “Open data and content can be freely used, mod ified and shared by anyone for any purpose” http://opendefinition.org How does research data management fit into the picture? Create Document Use Store Share Preserve • Data Management Planning • Creating data • Documenting data • Accessing / using data • Storage and backup • Selecting what to keep • Sharing data • Data licensing and citation • Preserving data Create Document Use Store Share Preserve Create Document Use Store Share Preserve Funders have expectations about data sharing… ͞The EuropeaŶ CoŵŵissioŶ͛s ǀisioŶ is that information already paid for by the public purse should not be paid for again each time it is accessed or used, and that it should benefit European companies aŶd ĐitizeŶs to the full.͟ http://ec.europa.eu/research/participants/data/ ref/h2020/grants_manual/hi /oa_pilot/h2020-hi-oa-pilot-guide_en.pdf Data management plans requested for those participating in Open Data pilot. ͞Data sets are becoming the new instruments of science͟ Dan Atkins, University of Michigan …ďut RDM is part of good researĐh praĐtiĐe! DMPs can help Projects participating in the pilot will be required to develop a Data Management plan (DMP), in which they will specify what data will be open. Note that the Commission does NOT require applicants to sub mit a DMP at the proposal stage. A DMP is therefore NOT part of the evaluation. DMPs are a deliverable for those participating in the pilot. What aspects of RDM should be in a DMP?  What data will be created (format, types, volume...)  Standards and methodologies to be used (incl. metadata)  How ethics and Intellectual Property will be addressed  Plans for data sharing and access  Strategy for long-term preservation Create Document Use Store Share Preserve A DMP is a plan to share! What is metadata? What is the difference? • Metadata – Standardised – Structured – Machine and human readable Metadata Documentation How should you describe your data? http://www.dcc.ac.uk/resources/metadata-standards What is the minimum required? • DataCite metadata used by OpenAIRE • Citation/disambiguation – Identifier e.g. DOI – Creator – Title – Publisher – Publication Year • Licencing/access conditions Where will you store the data during your research? • Your own laptop • University systems • Cloud storage • Combination Your decision will be based on how sensitive your data are, how robust you need the storage to be, who needs access to the data, and when they need access to the data! Which data must be kept? • Data, including associated metadata, needed to validate the results in scientific publications • Other curated and/or raw data, including associated metadata, as specified in the DMP Doesn’t apply to all data (researchers to define as appropriate) Don’t have to share data if inappropriate – exemptions apply Responsible researchers: know about exemptions • If results are expected to be commercially or industrially exploited • If participation is incompatible with the need for confidentiality in connection with security issues • Incompatible with existing rules on the protection of personal data • Would jeopardise the achievement of the main aim of the action • If the project will not generate / collect any research data • • If there are other legitimate reasons to not take part in the Pilot Can opt out at proposal stage OR during lifetime of project Should describe issues in the project Data Management Plan Which additional data might be kept after the project ends? - Could this data be re-used - Must it be kept as evidence or for legal reasons - Should it be kept for its value to you or others - Consider costs – do benefits outweigh cost? 5 steps to decide what data to keep www.dcc.ac.uk/resources/how-guides/five-steps-decide-what-data-keep Assign persistent identifiers • They are an alphanumeric code identifying a resource, organisation or individual • They must be – Unique – Persistent • Ideally they should be actionable too https://ssi-dev.epcc.ed.ac.uk/ Remember to consider physical data, software and models http://spatialinformationdesignlab.org/proje ct_sites/library/catalog.html http://www.ukcrcexpmed.org.uk/Coventry_Warwick_CRF/Publish ingImages/Tissue%20Bank%201.jpg Can your data be shared with others? • PI/researcher • Data repository and support staff • Research participants • Commercial partners • Secondary data user How will it be shared? http://service.re3data.org/search Zenodo • Joint effort by OpenAIRE-CERN • Multidisciplinary repository • Multiple data types • Citable data (DOI) • Links funding, publications, data & software www.zenodo.org • Does your publisher or funder suggest a repository? • Are there data centres or community databases for your discipline? • Does your university offer support for long-term preservation? www.dcc.ac.uk/resources/how-guides/license-research-data Licensing research data This DCC guide outlines the pros and cons of each approach and gives practical advi ce on how to implement your licence CREATIVE COMMONS LIMITATIONS NC Non-Commercial What counts as commercial? ND No Derivatives Severely restricts use These clauses are not open licenses Horizon 2020 Open Access guidelines point to: or EUDAT licensing tool Answer questions to determine which licence(s) are appropriate to use http://ufal.github.io/lindat-license-selector Options for open data • Domain repository • General repository – Figshare, Zenodo, Dryad • Institutional repository • Journal supplementary material • Departmental web page  General directories Re3data.org  Domain specific directories e.g. life sciences – Biosharing.org  Data journal recommendations Edinburgh research data blog: Sources of dataset peer review  Funding body recommendations E.g. Wellcome Trust Data repositories and database sources Finding external repositories Go Considerations • There may be an accepted repository used by peers or required by funders • Multidisciplinary studies may not have an obvious home • Data types and volumes will impact on decision How will you make your data discoverable? http://ckan.data.alpha.jisc.ac.uk/dataset https://www.researchfish.com/ http://gtr.rcuk.ac.uk/ http://researchdata.gla.ac.uk/ Options for closed data • Institutional data archive/vault • Safe havens – (e.g. secure patient data) • 3rd party data archiving • Cloud storage • Institutional servers – the ͚do ŶothiŶg͛ optioŶ Approach: as open as possible, as closed as necessary Image: ‘Balancing rocks’ by Viewminder CC-BY-SA -ND www.flickr.com/photos/light_seeker/778085722 4 Refer to free guides and briefing papers www.dcc.ac.uk/resources/ Guidelines from the Commission • Factsheet on Open Access – https://ec.europa.eu/programmes/horizon2020/sites/horizon2020/files/FactSheet_Ope n_Access.pdf • Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 – http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_ pilot/h2020-hi-oa-pilot-guide_en.pdf • Guidelines on Data Management in Horizon 2020 – http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_ pilot/h2020-hi-oa-data-mgt_en.pdf https://dmponline.dcc.ac.uk Make use of free tools • More visible research outputs and increased impact - even for negative results • Easier outputs reporting • Better and more reproducible research! May seem like a lot, but take it st ep by step! Thanks for listening! joy.davidson@glasgow.ac.uk www.dcc.ac.uk Follow us on twitter: @jd162a @digitalcuration and #ukdcc DMPonline demo https://dmponline.dcc.ac.uk Funded by: Sign-up as UMinho user DMPonline: institutional usage by UMinho community ‘eturning users see ͚My plans͛ page Summary of the DMPs that you have created, or others have shared with you. Note the varying permissions. Creating a new plan Select funder (if any) Select organisation for a dditional questions and g uidance Select other sources of guidance Plan details: summary Summary of the sections and qu estions in your DMP Overview of sections in a DMP Summary page with dro pdown buttons to expa nd and answer each sec tion Enables multiple phases Answering questions Notes who has answered th e question and when Progress bar updates how m any questions remain Institutions can provide examples and suggested answers Sharing plans Allow colleagues to read-o nly, read-write, or become co-owners Internal and external part ners Collaborative writing of DMPs Sections are locked for editing ǁheŶ they͛re ďeiŶg ǁorked oŶ by colleagues Templates can have multiple phases Remember to update the DMP throughout the life of the project! Phases Exporting DMPs Can export as plain text, PDF, html... Good practice when creating DMPs  Start early  Cost in sufficient effort to application  Write the plan collaboratively  Be realistic  Update DMP DMPonline: Library can provide support to UMinho users Try it out https://dmponline.dcc.ac.uk Thanks – any questions? DMP guidance, tools and resources: www.dcc.ac.uk/resources/data-management-plans Follow us on twitter: @digitalcuration and #DMPonline Thanks for listening! joy.davidson@glasgow.ac.uk www.dcc.ac.uk Follow us on twitter: @jd162a @digitalcuration and #ukdcc