Towards effective research recommender systems
Petr Knoth
CORE
Knowledge Media institute, The Open University
United Kingdom
In this talk
• Why recommender systems for research
• Challenges in building recsys in research
• The CORE recommender system
• Our work on CPA-based recommender systems
Based on the following papers
• Knoth, P., Anastasiou, L., Charalampous, A., Cancellieri, M., Pearce, S., 
Pontika, N. and Bayer, V. (2017) Towards effective research recommender 
systems for repositories, Open Repositories 2017
• Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S. and 
Jack, K. (2017) Building recommender systems for scholarly information  , 
Workshop: Scholarly Web Mining (SWM) at Tenth ACM International 
Conference on Web Search and Data Mining (WSDM2017), Cambridge, UK
• Knoth, P., Khadka, A. Can we do better than Co-Citations? -  Bringing 
Citation Proximity Analysis from idea to practice in research article 
recommendation. (under review)
Why recommender systems for research
What use case do research recommender systems address?
“As a user, I want to receive recommendations about 
content (papers, datasets, software, people to follow, 
grant opportunities, methods, conferences, etc.) that is of 
interest to me, so I can continuously increase knowledge 
in my field.”
[COAR Next Generation Repositories WG, 2017]
Why effect can it have? 
• Increase the accessibility 
(Azzopardi & Vinay, 2008) 
of resources in 
repositories
• Twice as often people 
access resources on CORE 
via its recommender 
system than via search.
   
An essential glue to link related content from a global 
distributed network of digital libraries. 
Why is it needed?
Challenges in building recsys for research
   
Collaborative filtering (CB) vs Content-based filtering 
(CBF)
Common recsys approaches
   
Personalised vs non-personalised
Common recsys approaches
   
Aspects to optimise a recommender system for
• Type of recs:
• Novelty/recency
• Relevance
• Familiarity
• Serendipity
• Audience: post-doc, lecturer, professor, etc.
• Use case: reviewing literature, staying in touch, 
learning about new field
More information at: http://dl.acm.org/citation.cfm?id=3057152 
Challenges in building recsys for research?
• Fast access to a global pool of research literature
• Access to user interaction data for:
• CF
• Personalisation
• Ground truths, online testing environments
• Global sign-on (across digital libraries) vs privacy
The CORE recommender system
Recommendations as a service
CORE provides a non-personalised CBF-based 
recommendation system for articles from across the 
global network of repositories. • 8 million full 
texts
• 77 million 
metadata 
records
• 2683 
repositories
   
OR2017 paper: https://arxiv.org/abs/1705.00578 
Recs delivered 
using:
• a plugin (Eprints, 
DSPace + 
general 
repository)
• over the CORE 
API
Recommendable items in the CORE recommender
   
How does the CORE recommender work?
• Currently only article-article recommender system
• Enrichment, e.g. identifiers. document type and 
citation data, prior to recsys.
• Features: 
• Textual: title, authors, abstract, full text!
• Recency: publication year
• Popularity: citations, readership
• Document type (thesis/article/slides)
• Others: subject field, Citation Proximity 
Analysis/Co-citations (in progress)
• Post-filtering using record quality
• Feedback (crowdsourcing a black list)
   
Combining features
• Evaluating different ranking functions (P,R,MAP, 
NDCG, etc.):
• Weights for boosting
• Scaling function (e.g. exp decay for recency)
• Offline ground truths:
• MAG citation assumption
• MAG co-citation assumption
• Learning to rank (haven’t done yet)
• Online A/B testing (haven’t done yet)
   
Distinctive features of the CORE recommender
• Use of full text rather than just abstracts
• Ensure open access availability of recommended 
articles (recsys for all)
• Free service 
• Integration with repositories and availability over API
More information: https://arxiv.org/abs/1705.00578 
Our work on CPA-based recommender systems
   
Can we do better than co-citations? 
• Build on the idea introduced by Beel et al. [1], putting 
the concept of Citation Proximity Analysis (CPA) into 
practice.  
• Extending the co-citation assumption: “the more 
often two articles are co-cited in a document, the 
more likely they are related” taking proximity into 
account
• Introducing and evaluating new proximity functions
   
Method
Proximity functions 
1. Co-citations baseline (no proximity information)
2. ProxMin
3. ProxSum
4. ProxMean
   
Initial evaluation
• Corpus of 350k papers from CORE
• 6 topics, 5 recs each, 4 systems, 10 annotators (1,200 
judgements), Fleiss’s 
• Improvement of 25% over baseline for P@5 ProxSum
Conclusions
• Research recommenders have value, but potential not 
fully realised yet
• Data availability and interoperability needed:
• Recommendable content/entities
• User interaction data and profiles
• Recommendations as a service: CORE recommender
• Potential for looking into new features, such as CPA