Applying Text Mining Services to Facilitate Discovery and Linking of Wheat Scientific Information and Data @openminted_eu Agroknow Applying Text Mining Services to Facilitate Discovery and Linking of Wheat Scientific Information and Data The scope of this presentation is to demonstrate the process of applying text mining services to support discovery and inter-linking of wheat scientific information. This is supported by a set of useful endpoints that can be used for in this process 2 OPENMINTED - The Open Mining Infrastructure for Text and Data Specific Use Cases • Let us consider a real-world problem. • Consider an organization (perhaps yours?) having research information and data in different databases. • How could we connect these data silos? @openminted_eu • Need to:  Define a workflow,  Design a data model,  Implement it! OPENMINTED - The Open Mining Infrastructure for Text and Data Proposed Solution • Develop a layer, running on top of the data silos:  Harvesting the data stored,  Aligning them in a uniform internal format,  Enriching/Interlinking them with external systems, @openminted_eu  Indexing the enriched data,  Providing them back through a search api. OPENMINTED - The Open Mining Infrastructure for Text and Data Proposed Solution – Complete Picture Need a data model to support our @openminted_eu workflow OPENMINTED - The Open Mining Infrastructure for Text and Data Data Model (1/2) • Everything is an object. • Each object has a specific type, with different properties. @openminted_eu • Of course everything is interlinked. • Any new content type can be added on the second level with specific properties. OPENMINTED - The Open Mining Infrastructure for Text and Data Data Model (2/2) @openminted_eu OPENMINTED - The Open Mining Infrastructure for Text and Data Enrichment Process • The process:  use raw information stored in resources (title, abstract, author/publisher list, full-text etc.),  recognize entities using various endpoints, @openminted_eu  interlinking them with both internal entities and external systems. OPENMINTED - The Open Mining Infrastructure for Text and Data Enrichment Process – Initial State @openminted_eu OPENMINTED - The Open Mining Infrastructure for Text and Data Enrichment Process – Entity Recognition @openminted_eu OPENMINTED - The Open Mining Infrastructure for Text and Data Enrichment Process – Entity Interlinking @openminted_eu Some statistics using this process (and only 3 endpoints): Useful Endpoints (1/2) • FREME API, used for topics extraction and annotation (against AGROVOC), and entity recognition (person, organization, location), • Geonames API, for location extraction and interlink, • OpenAIRE mining service (part of the OpenMinTed project), can be used to mine projects from text, data citation, classification, etc. @openminted_eu Useful Endpoints (2/2) • CropOntology, can be used to extract wheat trait entities, • PDF Text Extraction (and annotation) service, used to extract text from pdf files and annotate it using various endpoints (1st prize in 1st AgroHackathon, Montpellier, 29/6-1/7/2016) @openminted_eu Outcomes Using these Technologies • AKIF Search API • CIMMYT MetaSearch API @openminted_eu Analytics (1/3) @openminted_eu Analytics (2/3) @openminted_eu Analytics (3/3) @openminted_eu twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin vimeo.com/openminted bit.do/openmintedplus www.openminted.eu