Towards effective research recommender systems Petr Knoth CORE Knowledge Media institute, The Open University United Kingdom In this talk • Why recommender systems for research • Challenges in building recsys in research • The CORE recommender system • Our work on CPA-based recommender systems Based on the following papers • Knoth, P., Anastasiou, L., Charalampous, A., Cancellieri, M., Pearce, S., Pontika, N. and Bayer, V. (2017) Towards effective research recommender systems for repositories, Open Repositories 2017 • Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S. and Jack, K. (2017) Building recommender systems for scholarly information  , Workshop: Scholarly Web Mining (SWM) at Tenth ACM International Conference on Web Search and Data Mining (WSDM2017), Cambridge, UK • Knoth, P., Khadka, A. Can we do better than Co-Citations? - Bringing Citation Proximity Analysis from idea to practice in research article recommendation. (under review) Why recommender systems for research What use case do research recommender systems address? “As a user, I want to receive recommendations about content (papers, datasets, software, people to follow, grant opportunities, methods, conferences, etc.) that is of interest to me, so I can continuously increase knowledge in my field.” [COAR Next Generation Repositories WG, 2017] Why effect can it have? • Increase the accessibility (Azzopardi & Vinay, 2008) of resources in repositories • Twice as often people access resources on CORE via its recommender system than via search. An essential glue to link related content from a global distributed network of digital libraries. Why is it needed? Challenges in building recsys for research Collaborative filtering (CB) vs Content-based filtering (CBF) Common recsys approaches Personalised vs non-personalised Common recsys approaches Aspects to optimise a recommender system for • Type of recs: • Novelty/recency • Relevance • Familiarity • Serendipity • Audience: post-doc, lecturer, professor, etc. • Use case: reviewing literature, staying in touch, learning about new field More information at: http://dl.acm.org/citation.cfm?id=3057152 Challenges in building recsys for research? • Fast access to a global pool of research literature • Access to user interaction data for: • CF • Personalisation • Ground truths, online testing environments • Global sign-on (across digital libraries) vs privacy The CORE recommender system Recommendations as a service CORE provides a non-personalised CBF-based recommendation system for articles from across the global network of repositories. • 8 million full texts • 77 million metadata records • 2683 repositories OR2017 paper: https://arxiv.org/abs/1705.00578 Recs delivered using: • a plugin (Eprints, DSPace + general repository) • over the CORE API Recommendable items in the CORE recommender How does the CORE recommender work? • Currently only article-article recommender system • Enrichment, e.g. identifiers. document type and citation data, prior to recsys. • Features: • Textual: title, authors, abstract, full text! • Recency: publication year • Popularity: citations, readership • Document type (thesis/article/slides) • Others: subject field, Citation Proximity Analysis/Co-citations (in progress) • Post-filtering using record quality • Feedback (crowdsourcing a black list) Combining features • Evaluating different ranking functions (P,R,MAP, NDCG, etc.): • Weights for boosting • Scaling function (e.g. exp decay for recency) • Offline ground truths: • MAG citation assumption • MAG co-citation assumption • Learning to rank (haven’t done yet) • Online A/B testing (haven’t done yet) Distinctive features of the CORE recommender • Use of full text rather than just abstracts • Ensure open access availability of recommended articles (recsys for all) • Free service • Integration with repositories and availability over API More information: https://arxiv.org/abs/1705.00578 Our work on CPA-based recommender systems Can we do better than co-citations? • Build on the idea introduced by Beel et al. [1], putting the concept of Citation Proximity Analysis (CPA) into practice. • Extending the co-citation assumption: “the more often two articles are co-cited in a document, the more likely they are related” taking proximity into account • Introducing and evaluating new proximity functions Method Proximity functions 1. Co-citations baseline (no proximity information) 2. ProxMin 3. ProxSum 4. ProxMean Initial evaluation • Corpus of 350k papers from CORE • 6 topics, 5 recs each, 4 systems, 10 annotators (1,200 judgements), Fleiss’s • Improvement of 25% over baseline for P@5 ProxSum Conclusions • Research recommenders have value, but potential not fully realised yet • Data availability and interoperability needed: • Recommendable content/entities • User interaction data and profiles • Recommendations as a service: CORE recommender • Potential for looking into new features, such as CPA