The purpose of this introductory course is to provide a starting point to the concepts of Text and Data Mining (TDM), since the field is gradually gaining more attention from funders and researchers. The course is primarily intended for research support administrative staff, but others, such as researchers, librarians and repository managers may also find it useful.

Text and Data Mining is the process of extracting high quality of information from text or data to answer unknown questions. (OpenMinTeD, 2016)

The course contains six sections: 

  1. Introduction to Text Mining 
  2. Do you speak TDM? : Understanding the key concepts and areas of TDM 
  3. Text and Data Mining and Licensing
  4. What Research Support Staff can do for Text and Data Mining 
  5. Time to Text and Data Mine: Practical Activities 
  6. Glossaries on Text and Data Mining

The estimated time for completing it is 6 hours. 

Even though there is a section on “TDM and Licensing”, nonetheless, this section is very brief. When we were building this course we wanted to focus on the technical side of TDM; therefore, a practical guide with various TDM activities and examples is provided. We hope that users with no technical or very limited technical knowledge will follow the activities, gain an understanding of the TDM methods and practice with the TDM exercises. 

This course was created as part of the EU-funded “Opening Infrastructure for Text and Data Mining - OpenMinTeD” project, in collaboration with the Office of Scholarly Communication Cambridge University Library, UK. 

Course contributors: 
Dr. Marta Busse, University of Cambridge, UK 
Dr. Deborah Hansen, University of Cambridge, UK
Martine Oudenhoven - LIBER, The Netherlands
Dr. Nancy Pontika - The Open University, UK