Facilitate Open Science Training for European Research Plus ça change: Experiences from a decade of environmental research data management Dr. Peter Mooney A furrow we have been ploughing at EPA since 2004 Let's go back to a decade or so... ● Athens held the Summer Olympics (2004) ● Google went public making Brin and Page instant billionaires.. ● LOTR Return of the King won Best Picture at the Oscars 2004 ● “Clocks” by Coldplay won Best Record at the Grammys 2004 ● Brian Cowen became Minister for Finance ● Sumatra–Andaman earthquake/ Boxing Day tsunami https://www.flickr.com/photos/kewonflickr/9292691950/ Twitter wasn't even on the scene (it was born in March 2006) http://dot-social.co.uk/wp-content/uploads/2014/04/twitter-icon-by- Jurgen-Appelo-on-Flickr-2.0-Generic-CC-BY-2.0-.jpg Facebook was taking it's first steps "Thefacebook" by Source. Licensed under Fair use via Wikipedia - http://en.wikipedia.org/wiki/File:Thefacebook.png#/media/File:Theface book.png YouTube was also taking it's first steps "YouTube screenshot 2005" by Source (WP:NFCC#4). Licensed under Fair use via Wikipedia - http://en.wikipedia.org/wiki/File:YouTube_screenshot_2005.png#/medi a/File:YouTube_screenshot_2005.png Android arrived in 2008 after the iPhone in 2007. But we still really loved the Nokia 1680 2004: IEEE Multimedia Vol 11 (4) Funding of environmental research was undergoing great changes also ● The EPA would manage and administer a substantially large research budget ● Wider remit of research themes and scientific domains ● Closer links to policy and current needs ● Scaling up of the amount of funded environmental research in Ireland EPA ERTDI Funding Programme 2002 EPA ERTDI Funding Programme 2004 EPA STRIVE Funding Programme 2007 A very simplified view of the research project lifecycle Funding Awarded Project Begins Project Research and Development Project Finished Reporting obligations Report, papers etc 1 – 2 yearsProject lifetime 6 months to 5 years “Data” was the weakness in all of the EPA Research programmes ●Successful funding of Environmental Research in Ireland ●A growing capacity for Environmental Research and Development ●Funded Projects delivered reports and policy, journal papers, training etc ●No Data ..... Access to the outputs from environmental research looked very like this structure In 2004/2005 we proposed a radical open access approach to Environmental Research in Ireland ●All data created, generated, and collected by EPA funded research projects would be made publicly available – free of charge ●This was part of the research contract ●EPA Scientific Committee and Research Team collaborate to identify the data/information for public distribution ●EPA committed long-term to management of data and related infrastructure. This radical approach would link the primary data to the published reports and papers REPORTS & PAPERS Researchers needed IT and other structural support to make their data publicly accessible and available ● Embargo Period: up to 12 months after project end ● Access Control: ability to control access to certain files/datasets ● Update and Change: ability to manage the files into the future So we (I) built SAFER according to these precise specifications.... The SAFER architecture – easy for us to extend to update functionality or include new functionality as needed Metadata: ISO19115 and Dublin Core As an open data access tool SAFER has been very successful over the past number of years ● Around 5,500 downloads per year ● Almost 3,500 datasets/files publicly available for access and download ● Approximately 200 different project resources on SAFER ● Approximately 80GB of data SAFER is used by the EPA itself to provide open access to some of key EPA datasets SAFER is an example of a tool developed for the research audience but expanded to a wider audience We attempt to integrate SAFER into the project lifecycle of EPA funded research projects ● Projects are provided with an account on SAFER early in project lifecycle ● Engagement with project on data management practices (if required, but recommended) ● Research project creates metadata resource on SAFER close to project end ● Ongoing process of dataset upload ● Direction from EPA and Steering groups, etc But there is still a disconnect between funded research projects and access to research data ... What have we learned over this past decade in making Environmental Research Data publicly accessible in Ireland? So has anything really changed?  YES and NO ....  YES: The “digital natives” involved in Enviromental Research are more open  NO: We're still struggling to find the correct structures for open access to data  NO: Data management and making data openly accessible is still (a low priority) (misunderstood) There is still a lack of understanding of data management principles ●Metadata is still a mystery to most researchers (metadata about projects, metadata about data....) ●Metadata is still an annoyance or burden to most researchers (too time consuming) ●There is lack of appreciation of the need to manage data correctly (for example use databases rather than spreadsheets...) “But our project didn't generate any data or information that could be made publicly available” Microsoft Excel is still completely dominant in Environmental Research ● Excel is used almost universally as an analysis tool, visualisation, data storage, data capture, etc etc ● This greatly reduces the reusability of the data – without very detailed metadata and supporting documentation Excel datasets can be wildly complex EPA offer researchers the opportunity to discuss data management plans ● Before data collection, surveys, modelling etc gets underway – discuss plans and requirements with EPA Research IT expert ● This consultation has many advantages: – Opportunities for the researchers to undertake proper data modelling from the beginning – Consider use of databases rather than Excel – Guidance on how to separate data collection, analysis, visualisation, long-term storage https://www.flickr.com/photos/xaimex/34053752 The FEAR of allowing data to be freely and openly available is pervasive – this leads to data remaining hidden for years! The road ahead: Future Work https://www.flickr.com/photos/stuckincustoms/4848088053/sizes/l Environmental data, geographic data, geoscientific data etc will be heavily influenced by INSPIRE Need to begin to think seriously about making data more “INSPIRE Friendly” Observations & Measurements (O&M) is an international standard for modeling observation events and describing their relations to the target spatial objects under observation, the measured properties & measurement procedure, and the captured data resulting from those observation events. O&M Example 1 O&M Data Management MUST become part of under graduate curricula in Science and Arts ● Part of the wider problems around open access and data management grow from a lack of skills/understanding in this area ● Data Management (intro to Databases, metadata, data formatting etc) must become part of undegraduate curricula ● This should be considered as an essential part of professional scientific training GI-N2K Project (FP7) 2013 - 2016 The challenge is to better align GI S&T curricula at the academic level and in vocational training offers, with the needs of the GI job market. How do we ensure access to data and datasets into the future as the ways of storing data changes? We feel that there is an increased danger of losing some datasets or data forever if researchers do not “buy into” open access and data management practices https://www.flickr.com/photos/glynlowe/10921733615 A decade ago we all stored our data in a physical way on media we could “touch” Then the removable media USB revolution came along – suddenly we were able to bring our data anywhere Today we have so much (cheap) storage that we cannot keep track of the amount of data we have Checked on Komplett.ie in April 2015 And we also have “The Cloud” Cloud storage gives us the comfort of data access anywhere, any way, any time and almost without limits We are redesigning the whole architecture of SAFER over the next few months in EPA ● Optimising project reporting – reporting metadata once and in one place in EPA then this metadata is reused in several other services ● Encouraging researchers to make data and information resources available for long-term storage and open access ● Building better researcher services using web-based services ● Technologies have moved one beyond anyone's expectations in 10 years ● ICT infrastructures for Research provide incredible opportunities for collaboration, connectivity, learning and discovery ● Still .... the very fabric of open access is researchers and scientists with their datasets and data ● Education, consultation, implementation! In summary – much has changed but many things have stayed the same Dr. Peter Mooney Email: p.mooney@epa.ie https://www.flickr.com/photos/peterm7/8572375565