A wealth of published research outcomes is currently publicly available (mostly thanks to the really active Open Access initiatives and mandates that keep finding ways to open up even more research data); however, at the same time, researchers are still facing a challenge when seeking for specific elements of a research publication that would support their own research, such as an image, a diagram or a dataset related to a specific topic, such as crop disease, within the scientific literature. Indeed, such components are currently embedded in various types of publications and cannot be identified, described and retrieved as individual entities.
On the other hand, trying to manually identify the location of necessary components like the aforementioned ones is a challenging and time-consuming process; especially when we are referring to results retrieved from large bibliographic databases like the FAO AGRIS one which currently provides access to more than 8 million bibliographic records.
OpenMinTeD: “Open Mining INfrastructure for TExt and Data“ is a Horizon 2020 project that aims to provide solutions in cases like this. More specifically, the project aims to provide the necessary solutions that will allow the identification and retrieval of such components available in research publications. Following a user query with specific terms, value-added services based on text-mining mechanisms will preview related images/datasets that are located inside the publications. End users will be able to click the preview of the related figure (such as image, diagram or dataset name) and will be redirected to the specific location of the publication to study more details. This is expected to significantly reduce the time needed for retrieving such individual components and at the same time maximize the efficiency of traditional search mechanisms.
In the context of the project, a number of use cases will be validated, including the use of the AGRIS repositories, possibly other publisher multilingual content, public databases as well as the RSS feeds from the identified sources. The methodology applied will rely on external call for technologies on top of INRA’s text-mining services. Applying these technologies and enabling such features over the more than 8 million bibliographic records (including full-text) of the AGRIS database is expected to significantly facilitate the work of agri-food researchers who depend on available literature for their own research purposes.
The aim of OpenMinTeD is to establish an open and sustainable TDM platform and infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific related sources in a seamless way to advance research, promote interdisciplinary open science, and ultimately support evidence based decision making.
About text and data mining
Text and data mining (TDM) is emerging as a powerful tool for harnessing the power of and discovering value in data, by analysing structured and unstructured datasets and content at multiple levels and in many different dimensions in order to discover concepts and entities in the world, patterns they may follow and relations they engage in, and on this basis annotate, index, classify and visualise such content.
In scholarly communication, text mining is already deployed in various scientific areas, notably life sciences, to extract meaningful information and insights that are used for a variety of purposes from discovering indexical information and automatically filling in metadata records, to build or update lexical and semantic resources such as nomenclatures and termino-ontological resources, to linking concepts and entities through identity, similarity or other relations, to partially or fully automating parts of their customary workflows, thus assisting researchers and scientific data curators in making sense of the textual data. As such it is a first step towards knowledge modeling and integration of data from heterogeneous sources.