Text mining workflows for indexing archives with automatically extracted semantic metadata

With the vast amounts of textual data that many digital libraries hold, finding information relevant to users has become a challenge. The unstructured and ambiguous nature of natural language in which documents are written, poses a barrier to the accessibility and discovery of information. This can be alleviated by indexing documents with semantic metadata, e.g., by tagging them with the terms that indicate their “aboutness”. As manually indexing these documents is impracticable, automatic tools capable of generating semantic metadata and building search indexes have become attractive solutions. In this tutorial, we aim to demonstrate how digital library developers and managers (who do not necessarily have the expertise on natural language processing and text mining) can use the Argo text mining platform to develop their own customised, modular workflows for automatic semantic metadata generation and search index construction. In this way, we are providing digital library practitioners with the necessary technical know-how on building semantic search indexes without any programming effort, owing to Argo’s graphical interface for workflow construction and execution. We believe that this in turn will allow various digital libraries to build search systems that will enable their users to more efficiently find and discover information of interest.


Preview