1 twitter.com/openminted_eu Sophia Ananiadou National Centre for Text Mining University of Manchester Force 2017 OpenMinted community-driven applications 2 Engaging with the communities FORCE 2017 •  Scholarly communications •  Research performance, research publications recommendation system •  Rock art mining; TM Leica microscopes •  Life Sciences •  Metabolites, Curation of neuroscience, modeling chronic liver diseases •  Social Sciences •  Agriculture, Biodiversity 3 Methodology: application design FORCE 2017 •  General description •  Resources •  Document formats •  Knowledge bases •  Tools, components, services •  Deployment plan •  Data interfaces •  User interfaces •  Data processing scenarios •  Limitations •  Release Plan 4 Scholarly Communications FORCE 2017 Funding Mining Services Rock art research Frontiers Ease the speed XMI, JSON, PDF 5 Scholarly Communications FORCE 2017 Research Publications Recommendation system Research Excellence Trends Explorer Citation counts Readership counts 6 Life Sciences FORCE 2017 Curation Metabolites Neuroscience 7 Life Sciences FORCE 2017 Health State Modelling Non- alcoholic fatty liver disease Interventi on #1 NASH Non- alcoholic Steatohepa titis Diabetes End stage liver disease Interventio n #2 Interventi on #3 Interventi on #4 Hepatocel lular carcinom a 8 Agriculture and biodiversity FORCE 2017 Text mining over bibliographic data Text Mining over RSS Feeds PDF Extractor GrapeVineExtractor AgroVoc Extractor Metadata, Outline, Figures, Captions, References AgroVoc Term, AgroVoc ID, Lucene Score Variety, OIV ID, Lucene Score AGRIS CORE PDF Aggregator RSS Feed Reader GeoNames Extractor Geopolitical Extractor RSS Feeds Geopolitical Term, Geopolitical ID, Lucene Score Geoname, Geoname ID, Lucene Score, RDF Description FoodSafetyNews WaterWorld 9 Agriculture and Biodiversity FORCE 2017 Microbial Biodiversity Linking Wheat Data with Literature Where does Psychrobacter aquimaris usually live? Is lr34 gene related to wheat resistance to rust disease? 10 Agriculture and Biodiversity FORCE 2017 Extracting gene regulation networks involved in seed development (SeeDev) Network of AGL15 gene http://bibliome.jouy.inra.fr/demo/seedev/alvisir/webapi/ search? 11 Social Sciences FORCE 2017 Extracting Named Entities from survey Data 12 Focus: Text Mining for ChEBI 13 Text Mining for ChEBI •  Identifying metabolites for curation in ChEBI •  Linking metabolites to species, chemical information FORCE 2017 14 Text Mining for ChEBI FORCE 2017 15 Text Mining for ChEBI FORCE 2017 16 Text Mining for ChEBI •  Majority of entries are manually curated •  Time consuming •  Annotator fatigue •  Lack of completeness FORCE 2017 17 Text Mining for ChEBI FORCE 2017 18 Text Mining for CHEBI FORCE 2017 Corpus Stats: - 200 abstracts - 100 full papers Agreement: - 0.934 (Entities) - 0.779 (Relations) 19 Text Mining for ChEBI Identification of Entities + Events Models trained using corpora FORCE 2017 http://www.nactem.ac.uk/EventMine/ Miwa, M.,,S. Ananiadou (2015) BMC Bioinformatics, 16 (Supl. 10) 20 Focus: Text mining for Neuroscience 21 Text Mining for neuroscience FORCE 2017 Background •  Use these to aid curation in KnowledgeSpace •  In collaboration with Blue Brain Project at EPFL •  Curation for Neurolex 22 Text Mining for neuroscience FORCE 2017 KnowledgeSpace 23 Text Mining for neuroscience FORCE 2017 Entities of Interest Brain Region Ionic Current/Channel Model Organism Neuron Scientific Units/Values 24 Text Mining For Neuroscience FORCE 2017 Active Learning 1. Annotator labels (or corrects) examples 2. Examples are used to create new models 3. New models are used to automatically label new documents 4. Most informative sentences are selected 25 Text Mining for Neuroscience FORCE 2017 Entity Agreement Total in corpus Brain Region 0.891 1055 Neuron 0.825 767 Model Organism 0.846 299 Ionic Channel 0.639 201 Ionic Current 0.904 339 Ionic Conductance 0.810 76 Value 0.784 594 Unit 0.902 507 26 Text Mining for Neuroscience FORCE 2017 Methods •  Dictionary Fuzzy Matching •  Regular Expression ^.*(neuron(e?s)?)|(cells?)( .*)?$ Match any phrase with the strings ‘Neuron, Neurone, Neurons, Neurones, or cells. Entry Match Type Brown Rat Brown Rat Exact Match c elegans C.Elegans Fuzzy Match Drosophilia Young Drosophilia Fuzzy Match 27 Text Mining for neuroscience FORCE 2017 Methods •  Conditional Random Field •  Dictionary Features •  NER Suite – Generic Model •  Deep Learning NER •  Neural Architecture •  Data Driven 28 Text Mining for Neuroscience FORCE 2017 Entity Rules / Dictionaries CRF Deep Learning Brain Region 0.314 0.822 0.844 Neuron 0.269 0.757 0.814 Model Organism 0.435 0.844 0.869 Ionic Channel 0.278 0.600 0.800 Ionic Current 0.118 0.690 0.764 Ionic Conductanc e 0.070 0.364 0.813 Value 0.289 0.867 0.860 Unit 0.348 0.929 0.930 29 Text Mining for Neuroscience FORCE 2017 Examples 30 Text Mining for Neuroscience FORCE 2017 Examples 31 Text Mining for Neuroscience FORCE 2017 Examples 32 www.openminted.eu twitter.com/openminted_eu Thank You WP9 – Use case Scenarios and applications Sophia Ananiadou sophia.ananiadou@manchester.ac.uk