In scholarly communication, TDM is already a developed practice in some scientific fields. For example, in the life sciences and computer science, it is applied for the extraction of meaningful information that can have multiple further uses, such as automatically filling-in metadata records, or the creation of semantic relationships from a large volume of full-text documents or data sets. However, the adoption of TDM practices lacks on two fronts. Firstly, TDM has not yet received attention from the majority of the possible providers of scientific text and data, including the repositories’ community. Repositories, both institutional and subject, host collections of full-text content and datasets. Sometimes this content has an open license, which allows further content manipulation. Nonetheless, after a call that we sent out to the United Kingdom Council of Research Repositories (UKCoRR) list serve, we discovered that there was a limited number of TDM projects that had as their primary source of information the repositories’ collections. Second, the conduction of TDM is often considered a tedious exercise, with difficult to use tools and insecure access to databases. To address this challenge, the EU-funded project OpenMinTeD looks to enable the creation of an infrastructure that fosters and facilitates the use of TDM technologies in the scientific publications field, targeting both domain users and TDM experts.

By the end of this workshop the attendees will be able to:

  • Understand the concept of TDM and how it can be applied to the repositories community
  • Identify the top challenges around TDM
  • Explore how TDM can benefit the subject and institutional repositories and their communities
  • Demonstrate the TDM benefits to the research community in their academic institution
  • Share examples of how repositories’ managers can assist for the development of TDM projects using the existing repositories’ collections
  • Plan future steps for improving the current malfunctions in the system
  • Predict the future of the repositories with regards to TDM practices

Workshop blogpost.

Agenda

9:00
Repositories in the center of new scientific knowledge
Natalia Manola, Research and Innovation Center (ARC), Coordinating Person OpenMinTeD
9:25
How can repositories support the text and data mining of their content and why?
Petr Knoth and Nancy Pontika, CORE, The Open University
9:40
TDM: Demystifying legal barriers and identifying real obstacles
Thomas Margoni, University of Stirling and CREATe, University of Glasgow
9:55
Questions & Answers
10:10
Legal, Institutional and Technical Issues around Text and Data Mining : an exercise
Thomas Margoni (Legal issues) Natalia Manola (Institutional issues) Petr Knoth (Technical issues)
11:35
Tentative steps in mining UK theses
Sara Gould, EThOS, British Library
11:45
Building Teaching and Learning Corpora with the British Library EThOS Collection
11:55
Small text-mining experiments working with digitized texts at British Library Labs
12:05
Jisc Open Access services and importance of text and data mining capabilities
Balviar Notay, Jisc
12:15
Conclusion & General Discussion
Petr Knoth, CORE, The Open University

Where

Dublin

Full details

Organisers: OpenMinTeD
Language: English

Audience

Latest Tweets
( #openminted #tdm )