Facilitate Open Science Training for European Research Research data management (RDM): what do support services need to know... and do? Martin Donnelly, Digital Curation Centre, University of Edinburgh OVERVIEW 1.  Introductions and definitions 2.  Drivers for RDM 3.  What does it mean for researchers? 4.  What does it mean for support staff? Facilitate Open Science Training for European Research 1. INTRODUCTIONS AND DEFINITIONS What is RDM? A definition… “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” What sort of activities? -  Planning and describing data-related work before it takes place -  Documenting your data so that others can find and understand it -  Storing it safely during the project -  Depositing it in a trusted archive at the end of the project -  Linking publications to the datasets that underpin them Data management is a part of good research practice. - RCUK Policy and Code of Conduct on the Governance of Good Research Conduct •  Definitions vary from discipline to discipline, and from funder to funder… •  Here’s a science-centric definition: •  “The recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” (US Office of Management and Budget, Circular 110) •  [Addendum: This policy applies to scientific collections, known in some disciplines as institutional collections, permanent collections, archival collections, museum collections, or voucher collections, which are assets with long-term scientific value. (US Office of Science and Technology Policy, Memorandum, 20 March 2014)] •  And another from the visual arts: •  “Evidence which is used or created to generate new knowledge and interpretations. ‘Evidence’ may be intersubjective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tacit; and is consciously or unconsciously referenced by the researcher at some point during the course of their research.” (Leigh Garrett, KAPTUR project: see http://kaptur.wordpress.com/ 2013/01/23/what-is-visual-arts-research-data-revisited/) Okay, but what is ‘data’ exactly? •  “Research object” is a term that is gaining in popularity, not least in the humanities where the relevance of the term ‘data’ is not always recognised… •  Research objects can comprise any supporting material which underpins or otherwise enriches the (written) outputs of research •  Data (numeric, written, audiovisual….) •  Software code •  Workflows and methodologies •  Slides, logs, lab books, sketchbooks, notebooks, you name it! •  See http://www.researchobject.org/ for more info From data to research objects? Helicopter view: What are the benefits of active RDM? •  TRANSPARENCY: The evidence that underpins research can be made open for anyone to scrutinise, and attempt to replicate findings. •  EFFICIENCY: Data collection can be funded once, and used many times for a variety of purposes. •  RISK MANAGEMENT: A pro-active approach to data management reduces the risk of inappropriate disclosure of sensitive data, whether commercial or personal. •  PRESERVATION: Lots of data is unique, and can only be captured once. If lost, it can’t be replaced. 2. DRIVERS FOR RDM 1. Technological developments 2. Value for money / Return on investment 3. Risk management 4. Transparency, integrity and good scholarly practice ?  Developments in sensor technology, networking and digital storage enable new research and scientific paradigms ?  As costs also fall, possibilities for data sharing, citation and re-use become much more widespread ?  Journals dedicated solely to publishing data have even started to appear. That’s not to say it’s an entirely new thing: journals have always published data, just never before at such scale… Technology Rosse from Philosophical Transactions of the Royal Society, (MDCCCLXI) (or 1861 if you’d prefer) Repurposing / VfM via data re-use Ships’ log books build picture of climate change 14 October 2010 You can now help scientists understand the climate of the past and unearth new historical information by revisiting the voyages of First World War Royal Navy warships. Visitors to OldWeather.org will be able to retrace the routes taken by any of 280 Royal Navy ships. These include historic vessels such as HMS Caroline, the last survivor of the 1916 Battle of Jutland still afloat. By transcribing information about the weather and interesting events from images of each ship's logbook, web volunteers will help scientists build a more accurate picture of how our climate has changed over the last century. http://www.nationalarchives.gov.uk/ news/503.htm Detail  from  Royal  Navy  Recruitment  poster,  RNVR   Signals  branch,  1917  (Catalogue  reference:  ADM   1/8331)   Endeavour,  1768-­‐71   (Captain  Cook)   HMS  Beagle,   1830-­‐34   HMS  Torch,   1918   Funder principles/expectations •  Major funders in many countries now have data management policies, and mandate data management plans (DMPs) •  In the UK, the RCUK councils have seven shared principles which underpin their individual policies… 1.  Data as a public good 2.  Preservation 3.  Discovery 4.  Confidentiality 5.  First use 6.  Recognition 7.  Public funding •  Six of the seven RCUK councils require data management plans (or equivalent), as do Wellcome Trust, Cancer Research UK, and more… •  The European Commission is running an Open Data pilot in Horizon 2020, about which more later… Controversial FOI requests to… -  University of East Anglia -  Queen’s University Belfast -  University of Stirling Risk management -  Reinhart & Rogoff (2010) “Growth in a Time of Debt” - paper not peer-reviewed, data not initially made available… -  Very influential and repeatedly cited by politicians to lend weight to economic strategy -  Multiple issues (selective exclusions, unconventional weightings, coding error) identified by a postgrad researcher attempting to replicate the paper’s findings -  Widespread embarrassment, but at least the errors were discovered! Research quality and integrity 3. WHAT DOES IT MEAN FOR RESEARCHERS? • A disruption to their working processes • Additional expectations / requirements from the funders • But! It provides opportunities for new types of investigation • And leads to a more robust scholarly record The old way of doing things 1. Researcher collects data (information) 2. Researcher interprets/synthesises data 3. Researcher writes paper based on data 4. Paper is published (and preserved) 5. Data is left to benign neglect, and eventually ceases to be accessible Without intervention, data + time = no data Vines et al. “examined the availability of data from 516 studies between 2 and 22 years old” -  The odds of a data set being reported as extant fell by 17% per year -  Broken e-mails and obsolete storage devices were the main obstacles to data sharing -  Policies mandating data archiving at publication are clearly needed “The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes” according to Timothy Vines, one of the researchers. This underscores the need for intentional management of data from all disciplines and opened our conversation on potential roles for librarians in this arena. (“80 Percent of Scientific Data Gone in 20 Years,” HNGN, Dec. 20, 2013, http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-gone-in-20- years.htm.) Vines et al., The Availability of Research Data Declines Rapidly with Article Age, Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014 The new way of doing things Plan   Collect   Assure   Describe   Preserve   Discover   Integrate   Analyze   SHARE   …and   RE-­‐USE   The DataONE lifecycle model What does it mean for universities? ?  Three principal areas of focus ?  Developing and integrating their technical infrastructure (storage space, repositories/ CRIS systems, data catalogues, etc) ?  Developing human infrastructure (creating policies, assessing current data management capabilities, identifying areas of good practice, data management plan templates, tailoring training and guidance materials…) ?  Developing business plans for sustainable services / roles ?  Forming cross-function (hybrid) working groups, advisory groups, task forces, etc… http://blog.soton.ac.uk/ keepit/2010/01/28/aida- and-institutional- wobbliness/ 4. WHAT DOES IT MEAN FOR SUPPORT STAFF? • Need to understand the key elements in the process, as well as roles and responsibilities • Understand the key points of the funders’ requirements • Expect questions from researchers… Understand the different roles •  Three main roles for research support staff… •  Compliance: checking adherence with funder policies, at both ends of the funding process (pre-award and end-of- project) •  Guidance: helping researchers meet expectations and requirements •  Selection etc: some staff may also have an appraisal and retention role, making decisions re. what the institution will want to keep / share, under what conditions, and for how long. There are various reasons for universities to want to keep some datasets, and to get rid of others. •  Different institutions organise their provision in different ways; there’s no one-size-fits-all approach Understand funder requirements •  The DCC maintains an overview of the major UK and European funders’ data-related expectations / requirements •  http://www.dcc.ac.uk/resources/policy-and-legal/overview- funders-data-policies •  The European Commission has introduced an Open Data pilot in Horizon 2020 •  Details: http://ec.europa.eu/research/participants/data/ref/h2020/ grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf Expect questions from researchers •  …and know where to direct enquiries •  Also, expect resistance! •  It can pay to focus on the benefits rather than hammering home the requirements, but be clear that DMPs (and good RDM practice, more broadly) are no longer optional for many funders (and publishers) Last slide: take-home messages ?  Research data management (RDM) is… ?  An integral part of doing quality research in the 21st century ?  Increasingly expected / mandated by funders, publishers and others ?  An opportunity for new discoveries and different approaches to research ?  A safeguard against inappropriate data disclosure ?  An activity that requires careful planning and consideration, and – ideally – coordination and support across many stakeholder types THANK YOU Martin Donnelly Digital Curation Centre University of Edinburgh martin.donnelly@ed.ac.uk Twitter: @mkdDCC www.dcc.ac.uk www.fosteropenscience.eu Image credits: slide 3, http://www.flickr.com/photos/dougbelshaw/; slide 9, http://www.flickr.com/photos/rpmarks/; slide 24, https://www.flickr.com/photos/kaptainkobold Thanks to Sarah Callaghan, PREPARDE, for the Rosse example. All images are Creative Commons licensed.