Introduction Information on food microbial biodiversity is scattered across millions of scientific papers (2 million references in the PubMed bibliographic database in 2017). It is impossible to manually achieve an exhaustive analysis of these documents. Text-mining and knowledge engineering methods can assist the researcher in finding relevant information. Material & Methods We propose to study bacterial biodiversity using text-mining tools from the Alvis platform. First, we analyzed terms that designate Microbial and Habitat entities in text. Microorganism names were predicted using the NCBI taxonomy. Habitat entities were detected using the syntactic structure of the terms and the OntoBiotope ontology. This ontology has been specifically enriched for the recognition of food terms in text. In a second time, we predicted links between microorganisms and their habitats (labeled “Lives_in” relationships) using pattern and machine-learning based methods. The results of text-mining predictions are indexed and presented in a semantic search engine. Result The AlvisIR search engine for microbe literature gives online access to 1.2 million PubMed abstracts in 2015, among which 13% are specific to food. This tool makes it possible to use text-mining results to search for information on bacterial biodiversity. It covers all types of microbial habitats to help understand the origin of microbial presence in food. Significance This work presents the first semantic search engine dedicated to better understand microbial food biodiversity from text.

---------

How to quote : Chaix, E. (Auteur de correspondance), Deleger, L., Bossy, R., Nédellec, C. (2017). Text mining tools for extracting information about microbial biodiversity in food. In: Microbial Spoilers in food (p. 34). Presented at Microbial Spoilers in Food 2017, Quimper, FRA (2017-06-28 - 2017-06-30).


Preview


Authors: Estelle Chaix , Louise Deléger, Robert Bossy, Claire Nédellec
Publication year: 2017
Language: English (EN)
Level of knowledge: Introductory: no previous knowledge is required
Usage rights:

Attribution - CC-BY

Audience