Presentation’s Subtitle #openminted_eu, #or2016, #tdm Repositories in the centre of new sci ntific knowledge Text Mining: the next data frontier Natalia Manola Athena Research & Innovation Centre Some facts About scientific literature OR2016 - 13 June, 2016 - Dublin, IRELAND The global research community generates over 1.5 million new scholarly articles per annum. The STM report (2009) … some 90% of papers … are never cited. … 50% of papers are never read by anyone other than their authors, referees and journal editors Lokman I. Meho, The rise and rise of citation analysis, 2007… one paper published every 30 seconds … 70,000 papers published on a single protein, the tumor suppressor p53 Spangler et al, Automated Hypothesis Generation based on Mining Scientific Literature, 2014 2 Emerging solution(S) Machine reading process textual sources, organise and classify in various dimensions, extract main (indexical) information items, … and “understanding” identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data … and predicting enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict OR2016 - 13 June, 2016 - Dublin, IRELAND 3 What OpenMinted is About MAIN Objectives Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can discover, collaboratively create, share and re-use knowledge from a wide range of text based scientific and scholarly related sources. OR2016 - 13 June, 2016 - Dublin, IRELAND 5 A next step from Open Access to Open Science A complex Landscape egi conference - lisbon, 18-22 may 2015 Text Mining Researchers Computing Infrastructures Content Providers End Users 6 HIGH LEVEL ARCHITECTURE OR2016 - 13 June, 2016 - Dublin, IRELAND 7 Policies & guidelines service oriented – discovery, re-use of content and tools build on existing TDM tools - no focus on new algorithms infrastructure – focus on interoperability community driven - user centric requirements open science - openness at all levels Key Characteristics 8 OR2016 - 13 June, 2016 - Dublin, IRELAND Challenges Discoverable & accessible content & services• Document literature content, language/knowledge resources, data categories taxonomies, provenance information• Document language processing/text mining services and workflows• Generic and domain-specific metadata descriptions Interoperable services• Combine services into workflows• Combine content and language resources with services and workflows• Combine automatic and manual/crowdsourcing annotation services IPR and licensing• Study IPR restrictions for reuse of sources as well as possible exceptions• Promote clarity and standardisation of legal rights and obligations • Translate the legal & policy aspects into specifications for lawful user-to-service and service-to-service interactions OR2016 - 13 June, 2016 - Dublin, IRELAND 9 Building on existing language resources repositories and infras (meta-share, clarin) Starting with repositories and OA publishers via OpenAIRE and CORE Promoting existing standards and best practices AND technologies In close collaboration with the FUTURETDM project http://project.futuretdm.eu/ OR2016 - 13 June, 2016 - Dublin, IRELAND Scholarly Comm. Feature extraction Data citation Research analytics Life Sciences Curation of databases and lexica in Chembolomics & neuroinformatic s Agricultur e Extracting information from tables for food safety alerts Social Sciences Data citation Community Driven 1 0 From the very beginning… Requirements, content, barriers, expected outcomes. … to the very end Create applications, validate and evaluate the results. twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin vimeo.com/openminted bit.do/openmintedplus THANK YOU! Natalia Manola natalia@di.uoa.gr