bytes, sweat and tears: what it takes to build an open data exchange ecosystem lennart martens lennart.martens@vib-ugent.be computational omics and systems biology group VIB / Ghent University, Ghent, Belgium Why share our data? - Because it is the future, stupid. The bricks in a non-profit (domain specific) data bazaar The cavalry always comes late, but you need them Don’t be a dragon Why share our data? - Because it is the future, stupid. The bricks in a non-profit (domain specific) data bazaar The cavalry always comes late, but you need them Don’t be a dragon One of the big ideas in science will be the (ortogonal-) re-use of public (big!) data Data repository For proteomics, I built the PRIDE repository, part of the ProteomeXchange consortium Martens, Proteomics, 2005 and Vizcaíno, Nature Biotechnology 2014 This is when I left Why share our data? - Because it is the future, stupid. The bricks in a non-profit (domain specific) data bazaar The cavalry always comes late, but you need them Don’t be a dragon An open data ecosystem requires standardization of the entire workflow Masuzzo, Trends in Cell Biology, 2014 Go it together for the standards and repository and plan well enough ahead • You cannot enforce top-down standards on your own, so you need to enlist the active support of the whole community • Support your standards with user-oriented implementations and tools to ease adoption (developer and researcher) • Build your repository to last, and even then collaborate across continents; nobody wants a vanishing repository Financial crisis-proofing your data is possible, but requires globally distributed data Slotta, Nature Biotechnology, 2009, Csordas, Proteomics, 2013 and Martens, Proteomics, 2013 Why share our data? - Because it is the future, stupid. The bricks in a non-profit (domain specific) data bazaar The cavalry always comes late, but you need them Don’t be a dragon Journals are your best way to acquire data • Journals publish your work, so it is known across the community • Such papers are also crucial rewards for the (non-funded) work (e.g., standards, tools) • Journals provide the key incentive for authors to submit their data Funders are important for your lifeblood • Funders can spread awareness by mandating a data management and sharing section in a proposal • Funders carry only a weak stick for open data, as it often comes at the end of the grant • If funders have a stake in open data, they will (have to?) consider long-term funding of the corresponding infrastructure Senior scientists (demi-gods) are important for your ulcers keeping you on your toes • The demi-gods are often most resistant to data sharing (as per the Martens rationale) • Will challenge you (if not worse) throughout your career in data standardization and dissemination • Their conversion provides satisfaction, visibility, and vindication Why share our data? - Because it is the future, stupid. The bricks in a non-profit (domain specific) data bazaar The cavalry always comes late, but you need them Don’t be a dragon J.R.R. Tolkien, A Conversation with Smaug Avoid the elephant graveyard; make sure that these data live • Choose a good, solid license for the data in your system from day one; it will be challenged! (do the same for your software!) • Make available data truly accessible: lower thresholds for data entry and data retrieval • Make sure to have some good ideas on how to (orthogonally) re-use the data yourself The PRIDE Reshake feature provides a direct coupling to all public data in PRIDE Vaudel, Nature Biotechnology, 2015 My group published a lot of data re-use, mostly orthogonal, always cross-experiment Foster, Proteomics, 2011; Colaert, Nature Methods, 2011; Barsnes, Proteomics 2011, Vandermarliere, Proteomics 2013; Degroeve, Bioinformatics 2013 Importantly, deposited data are not the end; an ecosystem enables re-use as well! 5. Retrieval /dissemination from data repository 6. Multiscale and meta-scale analysis algorithms 7. Application to proof-of-concept studies Lock, PLOS ONE 2014; Masuzzo, Trends in Cell Biology, 2014 A sociologist’s take on our efforts towards (orthogonal) data reuse “This desire to reactivate data is widespread, and Klie et al. are not alone in wanting to show that ‘far from being places where data goes to die’ (Klie et al., 2007: 190), such data collections can be mined for valuable information that could not be obtained in any other way.” “In attempting to reactivate sedimented data in order to enable its re-use, their first step was ...” "... they are experiments in seeing, in furnishing ways of seeing how data on proteins could become re-usable, could be reactivated as collective property rather than the by-product of publication." Mackenzie and McNally, Theory, Culture and Society, 2013 www.compomics.com @compomics