The Open Data Pilot: practical implementation Sarah Jones Digital Curation Centre, University of Glasgow sarah.jones@glasgow.ac.uk Twitter: @sjDCC How to make data openly available OPEN RESEARCH DATA Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354 How can researchers make data open? 1. Choose the dataset(s) to share • What can be made open? This step may need to be revisited if problems are encountered later. 2. Apply an open license • Determine what IP exists. Apply a suitable licence e.g. CC-BY 3. Make the data available • Provide the data in a suitable format. Use repositories. 4. Make it discoverable • Post on the web, get a unique ID, register in catalogues… https://okfn.org www.dcc.ac.uk/resources/how-guides/license-research-data Licensing research data openly This DCC guide outlines the pros and cons of each approach and gives practical advice on how to implement a data licence CREATIVE COMMONS LIMITATIONS NC Non-Commercial What counts as commercial? SA Share Alike Reduces interoperability ND No Derivatives Severely restricts use These clauses are not open licenses Horizon 2020 Open Access guidelines point to: or EUDAT licensing tool Researchers can answer a series of questions to determine which licence(s) are appropriate to use http://ufal.github.io/lindat-license-selector Metadata standards • Metadata is basic descriptive information to help others identify and understand the structure of the data e.g. title, author... • Documentation provides the wider context e.g. the methodology / workflow, software and any information needed to understand the data • Relevant standards should be used for interoperability www.dcc.ac.uk/resources/metadata-standards Data file formats If researchers want their data to be re-used and sustainable in the long-term, they should opt for open, non-proprietary formats. Type Recommended Avoid for data sharing Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS Further examples: www.data-archive.ac.uk/create-manage/format/formats-table Data repositories http://service.re3data.org/search Zenodo • OpenAIRE-CERN joint effort • Multidisciplinary repository • Multiple data types – Publications – Long tail of research data • Citable data (DOI) • Links funding, publications, data & software www.zenodo.org Plan for sharing from the outset Many decisions taken early on in the project will affect whether the data can be made openly available. Researchers should: • Ensure consent agreements also include permission to archive and share data for reuse by others • Seek permissions for more than just the primary project purpose if signing licences to reuse third-party data. Derivative data may not be able to be shared if it includes somebody else’s IP • Explore the potential for openness when drafting agreements with commercial partners REVIEWING DATA MANAGEMENT PLANS What to look for in Data Management Plans Image CC-BY-NC-SA by Ralf Appelt www.flickr.com/photos/adesigna/4090782772 Horizon 2020 templates The DMP should address the points below on a dataset by dataset basis: • Data set reference and name • Data set description • Standards and metadata • Data sharing • Archiving and preservation (including storage and backup) Annex 2 (mid-term & final review) Scientific research data should be easily: 1. Discoverable 2. Accessible 3. Assessable and intelligible 4. Useable beyond the original purpose for which it was collected 5. Interoperable to specific quality standards Annex 1 (by month 6) http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf Common themes to cover • Data Description • Standards and Metadata (discoverable / usable / interoperable) • Data Sharing (as open as possible, as closed as necessary) • Archiving and preservation Key things to check • Is the plan appropriate? – adopting relevant standards – practices in line with norms for that field – use of support services e.g. university storage, subject repositories… • Does it seem feasible to implement? • Has sufficient information been provided? • Has advice been sought where needed? • Are restrictions and costs properly justified? Main judgement to make: Has the researcher taken time to reflect on what to do? There are no absolute right answers. You just want to be reassured that due consideration has been given and the approach seems reasonable. Data Description • Is it clear what data will be collected? • Are appropriate file formats proposed? • Has the reuse or integration of existing data been considered? (if appropriate) • If third-party data will be reused, has sharing been considered in the licence agreements? Standards and Metadata • Will enough contextual information and structured metadata be provided to allow others to find, understand and reuse the data? • Will the data be documented during the research? Has time been allocated to this? • Will formal standards be used? (where available) • Is information being captured & shared on the associated software and tools needed for reuse and reproducibility? Data Sharing • Is it clear which data will be shared and with whom? – Are opportunities to share data openly maximised? e.g. by seeking consent to share, anonymising data… – If data can’t be shared, are the reasons why explained? • Will the data be easily accessible and openly licensed? • If an embargo period is planned, is that in line with norms for that discipline? • Will persistent IDs be assigned for discovery and citation? Archiving and Preservation (incl. storage) • Will the research data be deposited in a suitable community database, repository or archive? • Are there any costs associated with preservation, and if so, how will these be covered? • Will the data be stored and backed-up appropriately during the research project? e.g. on managed university filestores rather than external hard drives Reviewing DMPs Useful guidelines • ESRC guidance for peer-reviewers www.esrc.ac.uk/_images/Data- Management-Plan-Guidance-for- peer-reviewers_tcm8-15569.pdf • MRC guidelines www.mrc.ac.uk/documents/pdf /data-management-plans-guidance- for-reviewers • Johns Hopkins grant reviewers cribsheet https://dmp.data.jhu.edu/resources /grant-reviewers-guide How to assess DMPs forthcoming guide DCC support on Data Management Plans • Checklist on what to include • How to guide on developing a plan • Webinars and training materials • DMPonline tool • Example DMPs www.dcc.ac.uk/resources/data-management-plans DMPonline A web-based tool to help researchers write DMPs Includes a template for Horizon 2020 https://dmponline.dcc.ac.uk Example data management plans • Technical appendix submitted to AHRC by Bristol Uni http://data.blogs.ilrt.org/files/2014/02/data.bris-AHRC-example-Technical-Plan.pdf • Rural Economy & Land Use (RELU) programme examples http://relu.data-archive.ac.uk/data-sharing/planning/examples • UCSD example DMPs (20+ scientific plans for NSF) http://libraries.ucsd.edu/services/data-curation/data-management/dmp-samples.html • LSHTM guide and worked example for Wellcome Trust • www.lshtm.ac.uk/research/researchdataman/plan/wellcometrust_dmp.pdf • Further examples: www.dcc.ac.uk/resources/data-management-plans/guidance-examples Thanks for listening DMP guidance, tools & resources: www.dcc.ac.uk/resources/ data-management-plans Follow us on twitter: @digitalcuration and #ukdcc #DMPonline Exercise: reviewing DMPs In pairs or small groups: 1. Read through the example DMP or one you brought along (5 mins) 2. Discuss what you think about the example DMP (10 mins) — Did you get a clear sense of what data will be created? — Were particular standards and file formats named and explained? — Is there enough information about how the data will be made available? — Will the data be deposited in a repository for preservation? 3. Report back the main points from your discussion (5 mins)