• 1 • 2 • 3 • 4 • 5 • 6 • 7 43 Text-mining methods used for information extraction in plant scientific papers 6. Information extraction evaluation 44CC-BY Evaluation of the automatically annotated corpus 45CC-BY How to assess the quality of predictions? One of the simplest methods is a comparison: - what a human would have done when annotating by hand (gold annotations) - and predictions of information extraction systems The principle is to take a sample of texts, and to calculate the difference between the two: the human, and the artificial intelligence system We usually used indices such as are recall, precision and F-measure. 46CC-BY Evaluation indices Recall: number of correct entities / relations found relative to the number of correct entities / relations existing in total. Precision: number of correct entities / relations found relative to the number of entities / relations found in total. F-measure: harmonic mean: combines precision and recall 47CC-BY Evaluation indices Example on the correct identification of a Tissue-type entity : Manual annotation : Green : Entity tagged as “Tissue” Orange : Entity poorly or not annotated Automatic annotation : Comparison: we look at what the system brought back and what it should have brought back (Note: we do not look at what it was right not to bring back) Red: Entity not returned (but should have been): false negative Blue: entity that was mistakenly returned: false positive Green: entity that has been correctly returned: true positive White: entity that has not been returned (and rightly so): true negative 48CC-BY Evaluation indices Recall: True Positive (True Positive + False Negatives) Precision: True Positive (True Positive + False Positive) (Correctly found) (Total to find) (Correctly found) (Total recovered) Comparison: Red: Entity not returned (but should have been): false negative Blue: entity that was mistakenly returned: false positive Green: entity that has been correctly returned: true positive White: entity that has not been returned (and rightly so): true negative Recall : 6 6 (6+3) 9 Precision : 6 6 (6+2) 8