Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec firstname.lastname@inra.fr Bibliome Team MaIAGE, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Microbial ecosystems Which microbes live in an environment? Techniques: Culture-based methods Genetic Metagenomics Metatranscriptmics Shot gun sequencing and so on... Helicobacter pylori Mycobacterium avium Escherichia coli ... Legionella pneumophila Yersinia pseudotuberculosis Aeromonas hydrophila ... Aspergillus flavus Listeria seeligeri Bacillus cereus ... Properties of environment? Microbial Interaction? Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Crossing between microbial species and habitats Difficulty : Highly variable forms in text or genomics database (GOLD, SRA, GenBank) “Out of European red-smear cheese samples of various types [...] 1.2% of the samples were contaminated with L. seeligeri” e.g. Artisanal cheeses from Tucuman Dairy cheese Caciocavallo cheese in Italy “Bacteria of the genera Enterococcus and Lactobacillus and coliform bacteria were isolated from Dutch-type semi-hard cheese” Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Habitat information is neither queryable nor comparable Described at different levels of accuracy and not standardized What is the cheese microflora? “Geotrichum candidum strains isolated from a traditional Spanish goats' milk cheese.” “Escherichia coli O157:H7 isolated from raw beef, soft cheese and vegetables in Lima” “Microbial ecology of Gorgonzola rinds and occurrence of different biotypes of Listeria monocytogenes.” Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Classic search engine query The query matches “cheese” and “microbe” but not “Camembert” , “Roquefort” or “Listeria monocytogenes” We propose a semantic search engine dedicated to microbial biodiversity in food. Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper http://bibliome.jouy.inra.fr/demo/food/alvisir/webapi/search Mini-link: https://frama.link/AlvisFood Semantic search engine of microbial habitat in food ... Interpretation of the query Cheese : Aspergillus : ... Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Has Aspergillus been isolated in cheese? Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Results of the query: aspergillus cheese Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Result of the query: aspergillus cheese Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Does Aspergillus lives in cheese ? Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Result of the query: aspergillus ~livesin cheese Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Behind the AlvisFood Search Engine Our approach is to extract from text “Microbe” and “Habitat” concepts Links between them We use AlvisNLP: Methods and tools for automatic extraction and analysis of biological text (i.e. Text Mining and Natural Language Processing) Machine learning methods trained with examples from microbiological and food domain experts Internal and external resources AlvisFood Search Engine: > 100,000 references from PubMed Selected by MeSH terms Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Microbial entity detection NCBI taxonomy Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Habitat entity detection Detection in text of nominal or adjectival groups Categorization of these groups with the Ontobiotope ontology Formal and structured representation of microbial habitats Partially reused in AlvisFoodSE Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Food sub-categories of Ontobiotope ontology From the EFSA classification Enrichment by microbial and food domains experts Formal indication that “Roquefort” is a “Cheese” allows semantic search Our automatic AlvisNLP tools link groups of words from the text to an Ontobiotope category achieve normalisation Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper Relationship between Microbe and Habitat Extraction of ~livesin relationship Hard problems in automatic language processing and artificial intelligence Achieved by machine learning methods trained with annotated examples Results downloadable as table with occurrence counts. Displayed as facets What are the taxa living in food? A query : {taxon}* ~livesin food Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper To conclude http://bibliome.jouy.inra.fr/demo/food/alvisir/webapi/search Mini-link: https://frama.link/AlvisFood Our tools are pioneers in the field of text-mining for microbial biodiversity Bibliome is a research team so: If you use AlvisFoodSE for your research, please cite us If you see an error, please send us an email, this will help us to improve our tools Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec Microbial spoilers in food 2017 - 28th - 30th June - Quimper On going work Acknowledgments INRA Ontobiotope and Florilège working groups Food Microbiome project Ambiguous cases for automatic tools Automatic detection of microbial phenotypes “Byssochlamys fulva and Neosartorya fischeri are heat-resistant fungi which are a concern to food industries” i.e. halophile, thermophile, phototroph ... Text mining tools for extracting information about microbial biodiversity in food Estelle Chaix, Louise Deléger, Robert Bossy and Claire Nédellec firstname.lastname@inra.fr Bibliome Team MaIAGE, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France http://bibliome.jouy.inra.fr/demo/food/alvisir/webapi/search Mini-link: https://frama.link/AlvisFood Thank you for your attention