bytes, sweat and tears:
what it takes to build an open data exchange ecosystem
lennart martens
lennart.martens@vib-ugent.be
computational omics and systems biology group
VIB / Ghent University, Ghent, Belgium

Why share our data? - Because it is the future, stupid.
The bricks in a non-profit (domain specific) data bazaar
The cavalry always comes late, but you need them
Don’t be a dragon
Why share our data? - Because it is the future, stupid.
The bricks in a non-profit (domain specific) data bazaar
The cavalry always comes late, but you need them
Don’t be a dragon
One of the big ideas in science will be
the (ortogonal-) re-use of public (big!) data
Data
repository
For proteomics, I built the PRIDE repository,
part of the ProteomeXchange consortium
Martens, Proteomics, 2005 and Vizcaíno, Nature Biotechnology 2014
This is when I left
Why share our data? - Because it is the future, stupid.
The bricks in a non-profit (domain specific) data bazaar
The cavalry always comes late, but you need them
Don’t be a dragon
An open data ecosystem requires 
standardization of the entire workflow
Masuzzo, Trends in Cell Biology, 2014
Go it together for the standards and repository
and plan well enough ahead
• You cannot enforce top-down standards on 
your own, so you need to enlist the active 
support of the whole community
• Support your standards with user-oriented 
implementations and tools to ease adoption 
(developer and researcher)
• Build your repository to last, and even then 
collaborate across continents; nobody wants 
a vanishing repository
Financial crisis-proofing your data is possible, 
but requires globally distributed data
Slotta, Nature Biotechnology, 2009,  Csordas, Proteomics, 2013 and Martens, Proteomics, 2013
Why share our data? - Because it is the future, stupid.
The bricks in a non-profit (domain specific) data bazaar
The cavalry always comes late, but you need them
Don’t be a dragon
Journals are your best way to acquire data
• Journals publish your work, so it is known 
across the community
• Such papers are also crucial rewards for the 
(non-funded) work (e.g., standards, tools)
• Journals provide the key incentive for 
authors to submit their data
Funders are important for your lifeblood
• Funders can spread awareness by mandating 
a data management and sharing section in a 
proposal
• Funders carry only a weak stick for open data, 
as it often comes at the end of the grant
• If funders have a stake in open data, they will 
(have to?) consider long-term funding of the 
corresponding infrastructure
Senior scientists (demi-gods) are important 
for your ulcers keeping you on your toes
• The demi-gods are often most resistant to 
data sharing (as per the Martens rationale)
• Will challenge you (if not worse) throughout 
your career in data standardization and 
dissemination
• Their conversion provides satisfaction, 
visibility, and vindication
Why share our data? - Because it is the future, stupid.
The bricks in a non-profit (domain specific) data bazaar
The cavalry always comes late, but you need them
Don’t be a dragon
J.R.R. Tolkien, A Conversation with Smaug
Avoid the elephant graveyard;
make sure that these data live
• Choose a good, solid license for the data in 
your system from day one; it will be 
challenged! (do the same for your software!)
• Make available data truly accessible: lower 
thresholds for data entry and data retrieval
• Make sure to have some good ideas on how 
to (orthogonally) re-use the data yourself
The PRIDE Reshake feature provides
a direct coupling to all public data in PRIDE
Vaudel, Nature Biotechnology, 2015
My group published a lot of data re-use, 
mostly orthogonal, always cross-experiment
Foster, Proteomics, 2011; Colaert, Nature Methods, 2011; Barsnes, Proteomics 2011,
Vandermarliere, Proteomics 2013; Degroeve, Bioinformatics 2013
Importantly, deposited data are not the end;
an ecosystem enables re-use as well!
5. Retrieval /dissemination
from data repository
6. Multiscale and 
meta-scale analysis 
algorithms
7. Application to
proof-of-concept studies
Lock, PLOS ONE 2014; Masuzzo, Trends in Cell Biology, 2014
A sociologist’s take on our efforts
towards (orthogonal) data reuse
“This desire to reactivate data is widespread, and Klie et al. are not 
alone in wanting to show that ‘far from being places where data goes 
to die’ (Klie et al., 2007: 190), such data collections can be mined for 
valuable information that could not be obtained in any other way.”
“In attempting to reactivate sedimented data in order to enable its 
re-use, their first step was ...”
"... they are experiments in seeing, in furnishing ways of seeing how 
data on proteins could become re-usable, could be reactivated as 
collective property rather than the by-product of publication."
Mackenzie and McNally, Theory, Culture and Society, 2013

www.compomics.com
@compomics