Mining Repositories: How to assist the research and academic community in their text and data mining needs

In scholarly communication, TDM is already a developed practice in some scientific fields. For example, in the life sciences and computer science, it is applied for the extraction of meaningful information that can have multiple further uses, such as automatically filling-in metadata records, or the creation of semantic relationships from a large volume of full-text documents or data sets. However, the adoption of TDM practices lacks on two fronts. Firstly, TDM has not yet received attention from the majority of the possible providers of scientific text and data, including the repositories’ community. Repositories, both institutional and subject, host collections of full-text content and datasets. Sometimes this content has an open license, which allows further content manipulation. Nonetheless, after a call that we sent out to the United Kingdom Council of Research Repositories (UKCoRR) list serve, we discovered that there was a limited number of TDM projects that had as their primary source of information the repositories’ collections. Second, the conduction of TDM is often considered a tedious exercise, with difficult to use tools and insecure access to databases. To address this challenge, the EU-funded project OpenMinTeD looks to enable the creation of an infrastructure that fosters and facilitates the use of TDM technologies in the scientific publications field, targeting both domain users and TDM experts.

By the end of this workshop the attendees will be able to:

Understand the concept of TDM and how it can be applied to the repositories community
Identify the top challenges around TDM
Explore how TDM can benefit the subject and institutional repositories and their communities
Demonstrate the TDM benefits to the research community in their academic institution
Share examples of how repositories’ managers can assist for the development of TDM projects using the existing repositories’ collections
Plan future steps for improving the current malfunctions in the system
Predict the future of the repositories with regards to TDM practices

Workshop blogpost.