A tale of research data management in practice Veerle Van den Eynden UK Data Archive University of Essex Embracing Data Management - Bridging the Gap Between Theory and Practice VLIR-FOSTER seminar Brussels, 4 June 2015 Science advances through data sharing Data used (national surveys): Public Risk Perceptions, Climate Change and the Reframing of UK Energy Policy in Britain, 2005; Public Perceptions of Climate Change and Energy Futures in Britain, 2010; … Society benefits from data sharing Helping researchers manage and share data • Data management = organisation, documentation, storage, safeguarding, preservation and accessibility of data, incl. ethical and legal aspects of data handling and data ownership • Data sharing = release of data for use by other people Why manage research data well ? • Data creation in research is often expensive • Data = cornerstone of research • Data underpin published findings • Good quality data = good quality research • Protect data from loss, destruction,… • Compliance with ethical codes, data protection laws, journal requirements, funder policies Research integrity Openness Boost factors: research funders, EU European open access policies: Horizon 2020, European Research Council (ERC) • communication & recommendation on access to / preservation of scientific information (July 2012) (publications & research data) • pilot on open access to research data, primarily data underlying (open access) scientific publications for Horizon 2020 • data management guidelines for Horizon 2020 (~ policies) generally based on OECD Principles and Guidelines for Access to Research Data fro m Public Funding Boost factors: research funders, UK • Publicly funded research data are a public good, produced in the public interest, that should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property. • in accordance with relevant standards and community best practice • metadata to make research data discoverable • legal, ethical, commercial constraints on release of research data • recognition for collecting & analysing data; limited privileged use • acknowledge sources of data, intellectual contributions, terms & conditions • use public funds to support the management and sharing of publicly-funded research data Research Councils UK Common Principles on Data Policy (May 2011) Boost factors: data infrastructure, UK Research funders invest in data support services and infrastructure, e.g. • UK Data Service (ESRC) • NERC data centres (NERC) • MRC Data Support Service • Genbank (BBSRC, MRC) • Atlas Petabyte Storage (STFC) • Archaeology Data Service (AHRC) UK data centres • Archaeology Data Service • Biomedical Informatics Research Network Data Repository • British Atmospheric Data Centre • British Library National Sound Archive • British Oceanographic Data Centre • Cambridge Crystallographic Data Centre • ChemSpider • ChemSpider Synthetic Pages • eCrystals • Endangered Language Archive • Environmental Information Data Centre • Ethno-ornithology World Archive • National Biodiversity Network • National Geoscience Data Centre • NERC Earth Observation Data Centre • NERC Environmental Bioinformatics Centre • Polar Data Centre • The Oxford Text Archive • UK Data Archive • UK Solar System Data Centre • Visual Arts Data Service Boost factors: training • Research Data Management Training MANTRA (Edinburgh) – online learning units – http://datalib.edina.ac.uk/mantra/ • Digital Curation Centre: – Data management planning http://www.dcc.ac.uk/resources/data-management-plans – DMP Online tool: https://dmponline.dcc.ac.uk/ – Data management training / courses http://www.dcc.ac.uk/training/data-management-courses-and-training UK Data Service • Curator of the largest collection of digital data in the social sciences and humanities in the UK • UK Data Archive (www.data-archive.ac.uk) lead organisation in a network • Based at University of Essex, essentially as department • Extensive experience of supporting researchers and other creators of social science data (and related disciplines) • We manage data sharing for the ESRC (since 1995) • Our best practice approaches to making data shareable are based on: – challenges faced by researchers to share data – archiving research data – quantitative and qualitative www.ukdataservice.ac.uk What we do in practice Research & strategy: • ESRC research data policy co-development • Research on data sharing practices e.g. Incentives and motivations for sharing research data: a researcher’s perspective (http://t.co/K6P006cROH) Analyse needs • evaluate existing data management practices • engage with and work with researchers • identify obstacles to data sharing Provide solutions for researchers & institutions • practical strategies to embed data management into research practices • tools and templates • guidance, training and bespoke advice • help control archive ingest costs (largest share of costs, more than access and preservation) Our data management guidance • Online best practice guidance: ukdataservice.ac.uk/manage-data.aspx • Managing and Sharing Research Data – a Guide to Good Practice: (Sage Publications Ltd) • Helpdesk for queries: ukdataservice.ac.uk/help/get-in-touch.aspx • Training: www.data-archive.ac.uk/create-manage/advice-training/events Our guidance • plan to share research data • legal and ethical aspects of data sharing and reuse • data copyright • documentation and metadata to understand and use data • file formats, organising, versioning and quality control • storage, backup, encryption and security of data and files • strategies for collaborative research A taster of our guidance Options for sharing confidential data • Obtain informed consent, also for data sharing and preservation / curation • Protect identities e.g. anonymisation, not collecting personal data • Regulate access where needed (all or part of data) e.g. by group, use, time period • Securely store personal or sensitive data (separately) Consent needed across the data life cycle • Engagement in the research process – decide who approves final versions of transcripts • Dissemination in presentations, publications, the web – decide who approves research outputs • Data sharing and archiving – consider future uses of data Always dependent on the research context – special cases for covert research, verbal consent, etc. Anonymising data • Direct identifiers – often not essential research info • Indirect identifiers • Remove / pseudonymise direct identifiers e.g. names, address, institution, photo • reduce the precision/detail of a variable through aggregation e.g. birth year vs. date of birth, occupational categories, area rather than village • generalise meaning of detailed text variable e.g. occupational expertise • restrict upper lower ranges of a variable to hide outliers e.g. income, age Managing data access • UK Data Service: web access to data and metadata • Data freely available for use; commercial use charges • Metadata / documentation always open • Data available under 3 access levels: OPEN SAFEGUARDED – End User Licence (e.g. not identify any potentially identifiable individuals) ● Special agreements: depositor permission; approved researcher ● Embargo for fixed time period CONTROLLED – only for accredited users ● Access via on-site or virtual secure environment (secure lab Assessment of existing data Information on new data Quality assurance of data Backup and security of data Difficulties in data sharing and measures to overcome these Consent, anonymisation, re-use strategies Copyright / Intellectual Property Ownership Responsibilities Management and curation ESRC DMP guidance ESRC data management plan • Model consent form: http://www.data-archive.ac.uk/media/112638/ukdamodelconsent.pdf • Survey consent statement: http://data-archive.ac.uk/media/147338/ukdasurveyconsent.doc • Transcription template: http://data-archive.ac.uk/media/136055/ukdamodeltranscript.pdf • Transcription instructions: http://data-archive.ac.uk/media/285633/ukda-example-transcription-instruct ions.pdf • Transcription confidentiality agreement: http://data-archive.ac.uk/media/285636/ukda-transcriber-confidentiality-ag reement.pdf • Data list template: http://data-archive.ac.uk/media/2989/UK%20Data%20Archive%20Exampl e%20Data%20List.pdf • RDM costing tool: www.data-archive.ac.uk/media/247429/costingtool.pdf Tools & templates Data management training • Regular workshops on ‘Managing and Sharing Research Data’ • Bespoke training events by invitation – ICPSR summer school Curating and Managing Research Data for Reuse (July 2015): http:// www.icpsr.umich.edu/icpsrweb/sumprog/courses/0149 – Doctoral training: managing and sharing research data, University of Ghent, Faculty of Psychology and Educational Sciences (Dec 2014): http://ukdataservice.ac.uk/news-and-events/eventsitem/?id=3914 – FOSTER-CESSDA RDM doctoral training series, Lausanne, Ljubljana, Cologne, Manchester (May-Nov 2015): http:// ukdataservice.ac.uk/about-us/projects/foster-cessda-training/details.as px ReShare – help guidance Questions Contact details veerle@essex.ac.uk