Facilitate Open Science Training for European Research RESEARCH DATA MANAGEMENT AND OPEN DATA 6-7 October 2015 University of Manchester, UK ANONYMISATION Veerle Van den Eynden, UK Data Service Options for sharing research data that may contain confidential information • Obtain informed consent, also for data sharing and preservation / curation • Protect identities e.g. anonymisation, not collecting personal data • Regulate access where needed (all or part of data) e.g. by group, use, time period Why anonymise research data? Ethical reasons • protect people’s identity (sensitive, illegal, confidential info) • disguise research location Legal reasons • not disclose personal data Commercial reasons Discuss with your research participants Identity disclosure A person’s identity can be disclosed through: • direct identifiers e.g. name, address, postcode, telephone number, voice, picture often NOT essential research information (administrative) • indirect identifiers – possible disclosure in combination with other information e.g. occupation, geography, unique or exceptional values (outliers) or characteristics Anonymising quantitative data • Remove direct identifiers (or replace with pseudonyms) e.g. names, address, institution, photo • Reduce precision/detail through aggregation e.g. birth year vs. date of birth, occupational categories, area rather than village • Generalise meaning of detailed text e.g. occupational expertise • Restrict upper lower ranges to hide outliers e.g. income, age Anonymising qualitative data • Remove direct identifiers, or replace with pseudonyms – often not essential research info • Avoid blanking out; use pseudonyms or replacements • Identify replacements, e.g. with [brackets] • Plan or apply editing at time of transcription • Avoid over-anonymising – removing information in text can distort data, make them unusable, unreliable or misleading; so balance anonymisation with the need to preserve context • Consistency within research team and throughout project. • Keep anonymisation log of replacements or removals made – keep separate from anonymised data files Anonymisation log Example: Anonymisation log interview transcripts Interview / Page Original Changed to Int1 p1 Spain European p1 E-print Ltd Printing p2 20 th June June p2 Amy Moira Int2 p1 Francis my friend P31. Joan  Mary P97. Carol  {Mother} P34. Colchester  {Town in S.E.England} P65. Welshpool High School  @@##High School##@@ Audio-visual data Digital manipulation of audio and image files can remove personal identifiers e.g. voice alteration, image blurring (e.g. of faces) Labour intensive, expensive, may damage research potential of data Better: • obtain consent to use and share data unaltered for research purposes • avoid mentioning disclosing information during audio recordings In practice: example anonymisation In practice: example anonymisation What if anonymising is impossible? • obtain consent for sharing non-anonymised data • regulating/restricting user access, e.g. at UK Data Archive: • archived data NOT in public domain • use of data for specific purposes only after user registration • data users sign legally binding End User Licence – e.g. not identify any potentially identifiable individuals • stricter access regulations for confidential data (case by case basis): • access to approved researchers only • requiring data access authorisation from data owner prior to data release • confidential data under embargo for given time period • secure access to data • researchers - consider access to data and safe storage Managing access to data • available for download/online access under open licence without any registration Open • available for download/online access to logged-in users who have registered and agreed to an End User Licence Safeguarded • available for remote or safe room access to authorised and authenticated users whose research proposal has been and who have received training Controlled Can such research data be open ? • ESRC research data policy: • Publicly-funded research data are a public good, produced in the public interest, which shall be made openly available and accessible with as few restrictions as possible in a timely and responsible manner that meets a high ethical standard and does not violate privacy or harm intellectual property. • Openly available research data, with as few restrictions as possible, means in the ESRC context that research data will be made available for re-use free of charge, as open data, safeguarded data or controlled data; the access category being selected to minimise the risk of disclosing personal information Open about data with restricted access Publish: • Which data exist • Where data are kept, e.g. which repository • Who can access them • For which purpose can they be used • Under which conditions In practice: data with access conditions Health and Social Consequences of the Foot and Mouth Disease Epidemic in North Cumbria, 2001-2003 (study 5407 in UK Data Archive collection) by M. Mort, Lancaster University, Institute for Health Research. • Interviews (audio + transcript) and written diaries with 54 people • 40 interview and diary transcripts are archived and available for re- use by registered users • 3 interviews and 5 diaries are embargoed until 2015 • audio files archived and only available by permission from researchers discover.ukdataservice.ac.uk/catalogue/?sn=5407 doc.ukdataservice.ac.uk/doc/5407/mrdoc/pdf/q5407userguide.pdf In practice: access conditions ReShare In practice: access conditions ReShare In practice: access conditions ReShare In practice: access conditions ReShare Questions ? • Veerle Van den Eynden • veerle@essex.ac.uk