David Pride and Petr KnothKnowledge Media institute, The Open University, UK Incidental or influential: Challenges in automatic detection of citation importance Introduction • Current quantitative research evaluation methods are largely based on citation counts. : Journal Level – Journal Impact Factor (JIF) : Author Level – h-index, g-index • None of these metrics account for citation type or sentiment. • Open Access means increased availability of full-text papers and articles for analysis. Citation Context Analysis • Discover where the citation occurs in the full text of a document. • Identifies the type, sentiment polarity or influence of the citation. Author et al. (2017) Why do we cite something? • Giving credit for related work • Identifying methodology / equipment • Providing background reading • Correcting one’s own work • Correcting the work of others • Criticizing previous work • Substantiating claims • Disputing priority claims of others – negative claims • Providing leads to poorly disseminated, poorly indexed, or uncited work • Authenticating data and classes of fact-physical constants. • Identifying original publications in which an idea or concept was discussed. • Identifying original publications or other work describing an eponymic concept or term • Disclaiming work or ideas of others – negative homage Methodology • Review of previous citation classification studies (Zhu, 2015; Valenzuela, 2015; Teufel, 2006). • Comparative analysis of two of these studies (Zhu, 2015; Valenzuela, 2015) • Goals: • Understand features and datasets used. • Identify which features perform best at identifying citation influence. • Investigate reproducibility of these studies. Human AnnotatorsSet of citing / cited paper pairs Training a Citation Classification Model Citations classified according to: SENTIMENT • Uses method • Compares works • Continues work • … TYPE INFLUENCE Annotated ‘Gold Standard’ dataset Author Overlap Direct Citations Abstract Similarity ….Trained Classifier Classification Features INPUT: Paper X Citation Classification Workflow Citation Extraction Author et al. (2017) Author et al. (2017) [1] Knoth, P., Anastasiou, L., Charalampous, A., Cancellieri, M., Pearce, S., Pontika, N., Bayer, V.: Towards effective research recommender systems for repositories. In: Proceedings of Open Repositories 2017 [3] ……… [4] ……… [n] ……… Citing / Cited Paper Pairs Feature Extraction Author Overlap Direct Citations Abstract Similarity …. Classifier Paper, Citation, Label X, [1], incidental X, [2], incidental X, [3], influential X, [4], incidental X, [n], ……. Ground Truth Dataset • 2 Annotators – binary influential / important judgements. • 465 Cited / Citing Pairs • ~15% of all citations are influential / important • ~4% of all citations are negative Incidental Influential0 50 100 150 200 250 300 350 400 Selection of classification features • F1 Total number of direct citations • F2 Number of direct citations per section • F3 Total number of indirect citations and number of ……indirect citations per section • F4 Author overlap (Boolean) • F5 Citation is considered helpful (Boolean) • F6 Citation appears in table or caption • F7 1 / Number of references • F8 Number of paper citations / all citations • F9 Similarity between abstracts • F10 PageRank • F11 Number of citing papers after transitive closure • F12 Field of cited paper. Valenzuela et al. Features 1.1 countsInPaper_whole 1.2 countsInPaper_secNum 1.3 countsInPaper_related 1.4 countsInPaper_intro 1.5 countsInPaper_core 2.1 sim_titleTitle 2.2 sim_titleCore 2.3 sim_titleIntro 2.4 sim_titleConcl 2.5 sim_titleAbstr 2.6 sim_contextTitle 2.7 sim_contextIntro 2.8 sim_contextConcl 2.9 sim_contextAbstr 3.1 contextMeta_authorMentioned 3.2 contextMeta_appearAlone 3.3 contextMeta_appearFirst 3.4 contextLex_relevant 3.5 contextLex_recent 3.6 contextLex_extreme 3.7 contextLex_comparative 3.8 contextLexOsg_wnPotency …. 5.1 aux_citeCount 5.2 aux_selfCite 5.3 aux_yearDiff Zhu et al. Features 2. Selection of classification features • F1 Total number of direct citations • F2 Number of direct citations per section • F3 Total number of indirect citations and number of ……indirect citations per section • F4 Author overlap (Boolean) • F5 Citation is considered helpful (Boolean) • F6 Citation appears in table or caption • F7 1 / Number of references • F8 Number of paper citations / all citations • F9 Similarity between abstracts • F10 PageRank • F11 Number of citing papers after transitive closure • F12 Field of cited paper. Valenzuela et al. Features 1.1 countsInPaper_whole 1.2 countsInPaper_secNum 1.3 countsInPaper_related 1.4 countsInPaper_intro 1.5 countsInPaper_core 2.1 sim_titleTitle 2.2 sim_titleCore 2.3 sim_titleIntro 2.4 sim_titleConcl 2.5 sim_titleAbstr 2.6 sim_contextTitle 2.7 sim_contextIntro 2.8 sim_contextConcl 2.9 sim_contextAbstr 3.1 contextMeta_authorMentioned 3.2 contextMeta_appearAlone 3.3 contextMeta_appearFirst 3.4 contextLex_relevant 3.5 contextLex_recent 3.6 contextLex_extreme 3.7 contextLex_comparative 3.8 contextLexOsg_wnPotency …. 5.1 aux_citeCount 5.2 aux_selfCite 5.3 aux_yearDiff Zhu et al. Features Fewer than half of these features performed better than the baseline. (Valenzuela et al. 2015) Selection of classification features • F1 Total number of direct citations • F2 Number of direct citations per section • F3 Total number of indirect citations and number of ……indirect citations per section • F4 Author overlap (Boolean) • F5 Citation is considered helpful (Boolean) • F6 Citation appears in table or caption • F7 1 / Number of references • F8 Number of paper citations / all citations • F9 Similarity between abstracts • F10 PageRank • F11 Number of citing papers after transitive closure • F12 Field of cited paper. Valenzuela et al. Features 1.1 countsInPaper_whole 1.2 countsInPaper_secNum 1.3 countsInPaper_related 1.4 countsInPaper_intro 1.5 countsInPaper_core 2.1 sim_titleTitle 2.2 sim_titleCore 2.3 sim_titleIntro 2.4 sim_titleConcl 2.5 sim_titleAbstr 2.6 sim_contextTitle 2.7 sim_contextIntro 2.8 sim_contextConcl 2.9 sim_contextAbstr 3.1 contextMeta_authorMentioned 3.2 contextMeta_appearAlone 3.3 contextMeta_appearFirst 3.4 contextLex_relevant 3.5 contextLex_recent 3.6 contextLex_extreme 3.7 contextLex_comparative 3.8 contextLexOsg_wnPotency …. 5.1 aux_citeCount 5.2 aux_selfCite 5.3 aux_yearDiff Zhu et al. Features Fewer than half of these features performed better than the baseline. (Valenzuela et al. 2015) Of 40 features, a combination of just FOUR features provided the best perfor ance. (Zhu et al. 2015) Irreproducible features F5 - Citation is considered helpful (Boolean) How is ‘considered helpful’ defined? No cue phrases provided. F10 – PageRank Based on what corpora – again, details not provided. F12 – Field of cited paper. This feature is not complete. Reproducible features F1 – Number of Direct Citations / ‘countsinPaper_Whole’ F4 – Author Overlap / auxSelfCite F10 – Abstract Similarity Evaluation • Valenzuela measures Precision @ R 0.90 • Masks some predictive ability of features. • Zhu measure in terms of Pearson r correlation. • Our study shows results in both formats. • Random Forest Classifier = best results Recall Precision P/R curve for Abstract Similarity Classifier initially performs well After identifying ~ 20% the classifier then struggles Results of experiments • Features tested using Valenzuela dataset • Results measured in terms of P/R and Pearson r • Difference in Author Overlap – different datasets • Abstract Similarity shows highest r value of tested features f The challenges • Lack of large ‘ground truth’ dataset for training classifiers. • Complex or irreproducible features. • PDF Extraction issues. Conclusions • Lack of massive scale gold-standard dataset. • Raises questions regarding publication of datasets as well as results. • Abstract Similarity shown to be better predictor of citation influence than demonstrated by earlier studies. • Serious concerns with reproducibility of previously tested features. • Significant variances in quality of PDF extraction tools. Thank you for listening For full details of the work being done by CORE and KMi visit: http://www.core.ac.uk http://www.kmi.open.ac.uk petr.knoth@open.ac.uk david.pride@open.ac.uk Citation Classification Schemes Teufel, S., Siddharthan, A., & Tidhar, D. (2006, July). Automatic classification of citation function. In Proceedings of the 2006 conference  on empirical methods in natural language processing (pp. 103-110).  Association for Computational Linguistics. Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence. Journal of the Association for Information Science and  Technology, 66(2), 408-427.  Valenzuela, M., Ha, V., & Etzioni, O. (2015, April). Identifying Meaningful Citations. In AAAI Workshop: Scholarly Big Data.