1Agenda This is where the footer goes Time Tittle Presenter 10:00 - 10:10 Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining Nancy Pontika, The Open University 10:10 -10:20 Machine Accessibility of Open Access Research Publications from Publisher Systems via ResourceSync Petr Knoth, The Open University 10:20 - 10:30 Improve interoperability across publisher platforms: a legal view Giulia Dore, University of Glasgow 10:30 -10:50 Hands on workshop – Moderators: Petr Knoth, Giulia Dore, Simone Sacchi 10:50 - 11:00 Workshop reporting Workshop participants 2@openminted_eu FORCE 2017, Berlin 25 – 27 October 2017 Dr. Nancy Pontika Knowledge Media Institute The Open University @nancypontika Aggregating Research Papers from Pub- lishers’ Systems to Support Text and Data Mining 3Goal This is where the footer goes Achieve seamless programmable access to full texts of open access research papers from publisher platforms for text mining 4Why machine accessibility of publications? • TDM can only fulfil its potential if TDM tools can be applied on the: • widest possible set of publications• as soon as publications are made available • Many publication providers => need for interoperability 5Current state of machine accesses to research literature • Aggregating publications from:• Repositories (Green OA)• Open access journals (Gold OA)• OAI-PMH https://c ore.ac.uk 6Current state of accessing research literature • ~80 million metadata records• 8.5 million full-text records• ~1.5 million monthly active users• Different services• API• Data dumps• Recsys• Analytics• … • But there is an issue here … https://core.ac.uk 7The idea of the Publisher Connector Provide seamless access over non-standard APIs https://core.ac.uk 8Key challenge This is where the footer goes Analyse how to scalably aggregate Open Access publications from publishers 9Approach This is where the footer goes 1. Surveyed publishers for machine accessibility of Open Access research content 2. Technically validated their answers 3. Implemented connectors to publishers systems 4. Addressed scalability issues 5. Encourage providers to follow good practices • Validation tools, advocacy 10 Publishers Survey: publication models This is where the footer goes Toll Access Open Access Elsevier Palgrave MacmillanCambridge University PressRoyal Society of ChemistryHigh Press Wire Publishing Technology Plc. Dove Medical PressIOP Publishing PeerJ eLife SciencesFrontiers 11 Results: access to full text (I) This is where the footer goes 54%31% 15% Automated downloads of OA full-text Website API FTP 75% 13% 13% Restricting access to full-text Don't restrict access in any way Specify a crawl delay Allow access to specific robots 12 Results: access to full text (II) This is where the footer goes 35% 24% 18% 12% 12% Accessing full-text by harvestingthe website Major search engines Recongnised services upon approval Everyone Allowed subject to fair use Everyone upon approval 39% 11% 39% 11% Reference of an article’s full-text on metadata Direct link to full-text Interface supporting full-text transfer Provide DOI Link to " 13 Results: access to full text (III) This is where the footer goes 50% 42% 8% Accessing content standards OAI Own API Z39.50 36% 24% 4% 32% 4% Files format PDF HTML Plain text HTML JSON 14 Conclusion This is where the footer goes The survey revealed that there is not an interoperability across harvesters for harvesting metadata and content and the development of a publisher connector its a necessity.