Presentation’s Subtitle
#openminted_eu, #or2016, #tdm
Repositories in the 
centre of new  
sci ntific knowledge
Text Mining: the next 
data frontier
Natalia Manola
Athena Research & 
Innovation Centre
Some facts About scientific 
literature
OR2016 - 13 June, 2016 - Dublin, IRELAND
The global research community generates over 1.5 
million new scholarly articles per annum.
The STM report (2009)
… some 90% of papers … are never cited. 
… 50% of papers are never read by anyone other than 
their authors, referees and journal editors
Lokman I. Meho,  The rise and rise of citation analysis, 
2007… one paper published every 30 seconds
… 70,000 papers published on a single protein, the tumor suppressor p53   
Spangler et al, Automated Hypothesis Generation 
based on Mining Scientific Literature, 2014
2
Emerging solution(S)
Machine reading
process textual sources, organise and classify in various dimensions, extract main (indexical) information items, 
… and “understanding” 
identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data 
… and predicting
enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict 
OR2016 - 13 June, 2016 - Dublin, IRELAND
3
What OpenMinted is About
MAIN Objectives
Establish an open and sustainable Text 
and Data Mining (TDM) platform and 
infrastructure where researchers can 
discover, collaboratively create, share 
and re-use knowledge from a wide range 
of text based scientific and scholarly 
related sources.
OR2016 - 13 June, 2016 - Dublin, IRELAND
5
A next step from Open Access 
to Open Science
A complex Landscape
egi conference - lisbon, 18-22 may 2015 
Text Mining 
Researchers
Computing 
Infrastructures
Content Providers
End Users
6
HIGH LEVEL ARCHITECTURE
OR2016 - 13 June, 2016 - Dublin, IRELAND
7
Policies & 
guidelines
service oriented – discovery, re-use 
of content and tools
build on existing TDM tools - no focus 
on new algorithms
infrastructure – focus on 
interoperability
community driven - user centric 
requirements
open science - openness at all levels
Key Characteristics
8
OR2016 - 13 June, 2016 - Dublin, IRELAND
Challenges
Discoverable & accessible content 
&  services• Document literature content, language/knowledge resources, data categories taxonomies, provenance information• Document language processing/text mining services and workflows• Generic and domain-specific metadata descriptions
Interoperable services• Combine services into workflows• Combine content and language resources with services and workflows• Combine automatic and manual/crowdsourcing annotation services
IPR and licensing• Study IPR restrictions for reuse of sources as well as possible exceptions• Promote clarity and standardisation of legal rights and obligations • Translate the legal & policy aspects into specifications for lawful user-to-service and service-to-service interactions
OR2016 - 13 June, 2016 - Dublin, IRELAND
9
Building on existing language resources repositories 
and infras (meta-share, clarin)
Starting with repositories and OA 
publishers
via OpenAIRE and  CORE
Promoting existing standards  and best practices 
AND technologies 
In close collaboration with the FUTURETDM project
http://project.futuretdm.eu/ 
OR2016 - 13 June, 2016 - Dublin, IRELAND
Scholarly 
Comm.
Feature 
extraction
Data citation
Research 
analytics
Life 
Sciences
Curation of 
databases and 
lexica in 
Chembolomics 
&
neuroinformatic
s
Agricultur
e
Extracting 
information 
from tables for 
food safety 
alerts
Social 
Sciences
Data citation
Community Driven
1
0
From the very beginning…
Requirements, content, barriers, expected outcomes.
… to the very end 
Create applications, validate and evaluate the results.
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus
THANK YOU!
Natalia Manola
natalia@di.uoa.gr