David Pride and Petr KnothKnowledge Media institute, The Open University, UK
Incidental or influential: Challenges in automatic detection of citation importance

Introduction
• Current quantitative research evaluation methods are largely 
based on citation counts.
: Journal Level – Journal Impact Factor (JIF)
: Author Level – h-index, g-index
• None of these metrics account for citation type or sentiment.
• Open Access means increased availability of full-text papers and 
articles for analysis. 
Citation Context Analysis
• Discover where the citation occurs in 
the full text of a document.
• Identifies the type, sentiment polarity 
or influence of the citation.   
Author et al. (2017)
Why do we cite something?
• Giving credit for related work
• Identifying methodology / equipment
• Providing background reading 
• Correcting one’s own work 
• Correcting the work of others 
• Criticizing previous work 
• Substantiating claims
• Disputing priority claims of others 
    – negative claims
• Providing leads to poorly disseminated, poorly indexed, 
    or uncited work
• Authenticating data and classes of fact-physical constants.
• Identifying original publications in which an idea or concept 
    was discussed.
• Identifying original publications or other work describing an 
    eponymic concept or term
• Disclaiming work or ideas of others 
    – negative homage
Methodology
• Review of previous citation classification studies (Zhu, 2015; 
Valenzuela, 2015; Teufel, 2006). 
• Comparative analysis of two of these studies (Zhu, 2015; 
Valenzuela, 2015)
• Goals:
• Understand features and datasets used.
• Identify which features perform best at identifying citation 
influence.
• Investigate reproducibility of these studies.
Human AnnotatorsSet of citing / cited paper 
pairs
Training a Citation Classification Model
Citations classified according to:
SENTIMENT
• Uses  method
• Compares works
• Continues work
• …
TYPE
INFLUENCE
Annotated ‘Gold Standard’ dataset
Author Overlap
Direct Citations
Abstract Similarity
….Trained Classifier
Classification Features
INPUT: Paper X
Citation Classification Workflow 
Citation Extraction
Author et al. (2017)
Author et al. (2017)
[1] Knoth, P., Anastasiou, L., Charalampous, A., 
Cancellieri, M., Pearce, S., Pontika, N., Bayer, V.: Towards 
effective research recommender systems for 
repositories. In: Proceedings of Open Repositories 2017 
[3] ………
[4] ………
[n] ………
Citing / Cited Paper Pairs
Feature Extraction
Author Overlap
Direct Citations
Abstract Similarity
….
Classifier
Paper, Citation, Label
X, [1], incidental
X, [2], incidental
X, [3], influential
X, [4], incidental
X, [n], ……. 
Ground Truth Dataset
• 2 Annotators – binary influential / 
important judgements.
• 465 Cited / Citing Pairs
•  ~15% of all citations are influential / 
important
•  ~4% of all citations are negative
Incidental Influential0
50
100
150
200
250
300
350
400
Selection of classification features
• F1 Total number of direct citations
• F2 Number of direct citations per section
• F3 Total number of indirect citations and number of     
  ……indirect citations per section 
• F4 Author overlap (Boolean)
• F5 Citation is considered helpful (Boolean)
• F6 Citation appears in table or caption
• F7 1 / Number of references
• F8 Number of paper citations / all citations
• F9 Similarity between abstracts
• F10 PageRank
• F11 Number of citing papers after transitive closure
• F12 Field of cited paper. 
Valenzuela et al. Features
1.1 countsInPaper_whole 
1.2 countsInPaper_secNum 
1.3 countsInPaper_related 
1.4 countsInPaper_intro 
1.5 countsInPaper_core 
2.1 sim_titleTitle 
2.2 sim_titleCore
2.3 sim_titleIntro
2.4 sim_titleConcl
2.5 sim_titleAbstr
2.6 sim_contextTitle
2.7 sim_contextIntro
2.8 sim_contextConcl
2.9 sim_contextAbstr
3.1 contextMeta_authorMentioned 
3.2 contextMeta_appearAlone 
3.3 contextMeta_appearFirst 
3.4 contextLex_relevant 
3.5 contextLex_recent
3.6 contextLex_extreme
3.7 contextLex_comparative
3.8 contextLexOsg_wnPotency 
….
5.1 aux_citeCount
5.2 aux_selfCite
5.3 aux_yearDiff 
Zhu et al.  Features
2. Selection of classification features
• F1 Total number of direct citations
• F2 Number of direct citations per section
• F3 Total number of indirect citations and number of     
  ……indirect citations per section 
• F4 Author overlap (Boolean)
• F5 Citation is considered helpful (Boolean)
• F6 Citation appears in table or caption
• F7 1 / Number of references
• F8 Number of paper citations / all citations
• F9 Similarity between abstracts
• F10 PageRank
• F11 Number of citing papers after transitive closure
• F12 Field of cited paper. 
Valenzuela et al. Features
1.1 countsInPaper_whole 
1.2 countsInPaper_secNum 
1.3 countsInPaper_related 
1.4 countsInPaper_intro 
1.5 countsInPaper_core 
2.1 sim_titleTitle 
2.2 sim_titleCore
2.3 sim_titleIntro
2.4 sim_titleConcl
2.5 sim_titleAbstr
2.6 sim_contextTitle
2.7 sim_contextIntro
2.8 sim_contextConcl
2.9 sim_contextAbstr
3.1 contextMeta_authorMentioned 
3.2 contextMeta_appearAlone 
3.3 contextMeta_appearFirst 
3.4 contextLex_relevant 
3.5 contextLex_recent
3.6 contextLex_extreme
3.7 contextLex_comparative
3.8 contextLexOsg_wnPotency 
….
5.1 aux_citeCount
5.2 aux_selfCite
5.3 aux_yearDiff 
Zhu et al.  Features
Fewer than half of these features 
performed better than the 
baseline.
(Valenzuela et al. 2015)
Selection of classification features
• F1 Total number of direct citations
• F2 Number of direct citations per section
• F3 Total number of indirect citations and number of     
  ……indirect citations per section 
• F4 Author overlap (Boolean)
• F5 Citation is considered helpful (Boolean)
• F6 Citation appears in table or caption
• F7 1 / Number of references
• F8 Number of paper citations / all citations
• F9 Similarity between abstracts
• F10 PageRank
• F11 Number of citing papers after transitive closure
• F12 Field of cited paper. 
Valenzuela et al. Features
1.1 countsInPaper_whole 
1.2 countsInPaper_secNum 
1.3 countsInPaper_related 
1.4 countsInPaper_intro 
1.5 countsInPaper_core 
2.1 sim_titleTitle 
2.2 sim_titleCore
2.3 sim_titleIntro
2.4 sim_titleConcl
2.5 sim_titleAbstr
2.6 sim_contextTitle
2.7 sim_contextIntro
2.8 sim_contextConcl
2.9 sim_contextAbstr
3.1 contextMeta_authorMentioned 
3.2 contextMeta_appearAlone 
3.3 contextMeta_appearFirst 
3.4 contextLex_relevant 
3.5 contextLex_recent
3.6 contextLex_extreme
3.7 contextLex_comparative
3.8 contextLexOsg_wnPotency 
….
5.1 aux_citeCount
5.2 aux_selfCite
5.3 aux_yearDiff 
Zhu et al.  Features
Fewer than half of these features 
performed better than the 
baseline.
(Valenzuela et al. 2015)
Of 40 features, a combination of 
just FOUR features provided the 
best perfor ance. 
(Zhu et al. 2015)
Irreproducible features
F5 - Citation is considered helpful (Boolean)
How is ‘considered helpful’ defined? No cue phrases provided.
F10 – PageRank
Based on what corpora – again, details not provided. 
F12 – Field of cited paper. 
This feature is not complete. 
Reproducible features
F1 – Number of Direct Citations / ‘countsinPaper_Whole’ 
F4 – Author Overlap  /  auxSelfCite
F10 – Abstract Similarity  
Evaluation
• Valenzuela measures Precision @ R 0.90
• Masks some predictive ability of features.
• Zhu measure in terms of Pearson r correlation.
• Our study shows results in both formats.
• Random Forest Classifier = best results Recall
Precision
P/R curve for Abstract Similarity 
Classifier initially performs well
After identifying ~ 20% the classifier then struggles
Results of experiments 
• Features tested using Valenzuela dataset
• Results measured in terms of P/R and Pearson r
• Difference in Author Overlap – different datasets
• Abstract Similarity shows highest r value of tested features
f
The challenges 
• Lack of large ‘ground truth’ dataset for training classifiers.
• Complex or irreproducible features. 
• PDF Extraction issues. 
Conclusions
• Lack of massive scale gold-standard dataset.
• Raises questions regarding publication of datasets as well as results.
• Abstract Similarity shown to be better predictor of citation influence 
than demonstrated by earlier studies.
• Serious concerns with reproducibility of previously tested features.
• Significant variances in quality of PDF extraction tools. 
Thank you for 
listening 
For full details of the work being done by CORE and KMi visit: 
http://www.core.ac.uk
http://www.kmi.open.ac.uk
petr.knoth@open.ac.uk david.pride@open.ac.uk 
Citation Classification Schemes
Teufel, S., Siddharthan, A., & Tidhar, D. (2006, July). Automatic classification of citation function. In Proceedings of the 2006 conference  on empirical methods in natural language processing (pp. 103-110).  Association for Computational Linguistics.
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence. Journal of the Association for Information Science and  Technology, 66(2), 408-427. 
Valenzuela, M., Ha, V., & Etzioni, O. (2015, April). Identifying Meaningful Citations. In AAAI Workshop: Scholarly Big Data.