What could Open Science really mean for Chemistry? Simon Coles (s.j.coles@soton.ac.uk) Director, UK National Crystallography Service Why do we do research? To further our understanding • New thinking, methods, products, areas • Its best to do this with the help of others • Competition, collaboration, cooperation, communication, circulation 3 21 r’s • scientific method – reproducible, repeatable, replicable, reusable • access – referenceable, retrievable, reviewable • understanding – replayable, reinterpretable, reprocessable • new use – recomposable, reconstructable, repurposable • social – reliable, respectful, reputable, revealable • curation – recoverable, restorable, reparable, refreshable. 4David DeRoure, 2015 • Does the system we use actually enable us to do this? • Arguably not – and even worse, it is rather being used more for other purposes instead! • End up ‘gaming’ the system – Publish or Perish: all about furthering ones career – Reputation and ego – Reputation and institution – Funding depends on outputs 5 6• We have Open Access – doesn’t that solve everything? • Can I go back to the lab now? • Er, no! That only gives us about two r’s – what about the others?! Quality: peer review “We had sent you our manuscript for publication and had not authorized you to show it to specialists before it is printed. I see no reason to address the in any case erroneous comments of your anonymous expert. On the basis of this incident I prefer to publish the paper elsewhere.” • Nature introduced peer review in 1967 (prior to this many articles accepted based on editorial judgement) • As late as 1938 Science relied on personal solicitations for virtually all article submissions 7 Peer review has been around for ages… A. Einstein, submission to Phys. Rev., 1936 Myth! Why did peer review become ubiquitous? • Increasing specialisation of research – individual editors couldn’t keep up! • Increase in volume of research being published – mid 20th century increase in scientists and technology • Because papers could be copied and distributed 8Does all this sound familiar…?! We use the opinion of a large number of others all the time Nature Open Peer Review Experiment 2066 ‘Future history’ Blog Post: The end of the article • it was no longer possible to include the evidence in the paper • it was no longer possible to reconstruct a scientific experiment based on a paper alone • writing for increasingly specialist audiences restricted essential multidisciplinary reuse • research records needed to be readable by computer to support automation and digital curation 11 2066… • single authorship gave way to casts of thousands of collaborators and citizen scientists, leading to failure of the authorship incentive model • quality control models scaled poorly with the increasing volume and open access movement, obscuring innovation • alternative reporting was found necessary for compliance with increasingly stringent scientific and industrial regulations • frustrated by inefficiencies in scholarly communication that stifled progress, research funders demanded change. 12 Back to 2016 (or was it 1980?!) 13 • Many stakeholders, all of whom interact with one another, provide different services and expect different things in return. • A number of functions beyond publication of research outcomes are fulfilled… • Quality assurance and validation (peer review) • Building of researcher reputation • Ranking of institutions • Allocation of funding • Socio-economic impact • At the heart of this is the researcher without whom the system falls apart Maybe we have a solution to change this? 14 Simpson, H. The Simpsons (2005), Eds. Groening, M., Brooks, J.L. & Simon, S., Series 16, Episode 8, Original air date (US) 06-Feb-2005. • Can these interactions be rearranged to solve the problems eg the serials crisis, problems with peer review, lack of reproducibility • … and encourage greater openness too! • Researchers perform many of the functions: disintermediation proposes stripping out intermediaries – eg publishers, so researchers can perform their roles independently. • The Web provides a flexible and open channel to enable disintermediation… • … but this is a huge paradigm shift in how scholarly communication occurs, shifting roles which have formed over hundreds of years of conventional publishing. Lets start from a new beginning • Why not take into account data??? • Its at the root of everything… • In fact: why not make data a stand-alone, first-class citizen in the world of research outputs? Disaggregation ‘Database Publication’ Open Access Data Publishing Why not do it yourself? Need to go all the way back • Also need RAW data, software and process details Its all about process “In theory, there is no difference between theory and practice. But, in practice, there is.” Yogi Berra 21 Laboratory Notebooks • All this is not a new problem… 22 A thorough record of research Da Vinci didn’t get it right though… 24 LabTrove Imposing structure – Planning & Enactment Ontology • Plan (Prospective provenance) 25 • Enactment (Retrospective provenance) • Realisation oreChem Plan (for eCrystals) 26 • Machine-readable representation of methodology • Describes requirements for materials, software, data products Capturing process 27 Making Plans! 28 Conditions / Description Reaction Scheme Equipment Reagents / Quantities Steps Create / Export In the ‘wet lab’: Notelus 29 Import plan / Open new notebook Generate / review steps, equipment, scheme & materials Conduct experiment Export record What more do we get out of this? elnItemManifest – a need for standards • 3 layer metadata model for description, export & packaging • Published openly at http://wp.me/p2JoQ6-xF (Dial-a- Molecule) & Journal of Cheminformatics, 2013, 5:52. DOI: 10.1186/1758-2946-5-52 31 Integration of data sources 32 Actively assist process ‘on the spot’ 33 • CREAM project Big data… • Enabling ChemInformatics – a whole new area of chemistry just waiting to be exploited • 70 million compounds in CAS – But how many could there be – informing virtual libraries?! • 10 million reactions in databases – But how many have really been performed?! • Artificial Intelligence…. 34 Right now we need more (accessible) data • Chemical space is big – really big • We’ve only covered a small part of it! • Because we don’t publish everything! • Even worse – because we don’t do everything!! 35 Grand Challenges • How can we make molecules in DAYS, not YEARS? • We want to be able to instantly optimise or predict the outcome of a reaction – Based on what we already know – Then extrapolate • Dial-a-Molecule: A network to think about and address this problem… • http://generic.wordpress.soton.ac.uk/dial-a-molecule/ 36 37 Past Data Theoretical Predictions New ‘Perfect’ Reactions New Reaction Systems Artificial Intelligence Data Capture Statistical Analysis Real-time Analysis/Opti misation PLAN EXECUTE Dial-a-Molecule Thanks 38