Data Extraction and Annotation from Viticulture Publications (PDF files) @openminted_eu Agroknow Data Extraction and Annotation from Viticulture Publications (PDF files) The presentation was done in the context of the 1 st Agrohackathon event that was organized in Montpellier between 29/6/2016 and 1/7/2016. The scope of the presentation is to demonstrate how Information can be extracted from publications on viticulture research 2 OPENMINTED - The Open Mining Infrastructure for Text and Data Main Goals • Extract textual information contained in pdf files of (agricultural) bibliographic resources. • Use extracted information against known endpoints for the semantic annotation. Design a modular system easily configurable and @openminted_eu • extendable. • Have an intuitive way for end users to semantically annotate their pdf files. OPENMINTED - The Open Mining Infrastructure for Text and Data The process – Key Features • We divided the system into 4 separate components:  API Endpoint, where all the calls are made, using as input the url of a pdf file.  TextExtractor, where the actual extraction of textual @openminted_eu information takes place.  Controllers, where the extracted information is passed and then external endpoints are called.  Mapper, where the returned annotated result is presented back in a unifying manner. OPENMINTED - The Open Mining Infrastructure for Text and Data PDFExtractor Workflow @openminted_eu OPENMINTED - The Open Mining Infrastructure for Text and Data Mappings – AgroPortal with FREME @openminted_eu OPENMINTED - The Open Mining Infrastructure for Text and Data Mappings – AgroPortal with OA @openminted_eu OPENMINTED - The Open Mining Infrastructure for Text and Data Proposed Evolution of OA based on FREME @openminted_eu OPENMINTED - The Open Mining Infrastructure for Text and Data Future Work • Integrate more controllers into the workflow. • Design a cleansing component clearing out redundant information. • Provide a richer API endpoint for the system. @openminted_eu • Benchmark various endpoints called by the controllers. • Build a front-end web app on top to help end users. twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin vimeo.com/openminted bit.do/openmintedplus www.openminted.eu