Image

EOSC Matchmaker

Make research data findable, usable, and actionable
Image

EOSC Matchmaker enhances research workflows by intelligently connecting datasets with suitable analytical tools.

It aggregates and enriches metadata from diverse repositories, enabling AI-driven discovery, automated pairing of data and tools, and seamless execution through orchestration services.

By supporting all key phases of the research data lifecycle, from planning to sharing, it accelerates data-driven science, improves metadata quality, and promotes FAIR and reusable research outputs.

Name

EOSC Matchmaker

What for

Easily find data and analysis tools

Intended Users

Researchers, Research Institutions

How does it work?

EOSC Matchmaker has two components behind the user interface
Metadata Warehouse

The Metadata Warehouse and Data Discovery component consists of: 

  • the multi-tiered metadata warehouse and its connectors towards data repositories of the federation, 
  • the AI-based search engine, 
  • tools for data FAIRness, 
  • the interfaces towards the Data Preparation component (Metadata Enrichment) and the federated EOSC Resource Catalogue in the EOSC EU Node.

 

The Metadata Warehouse aggregates data from the federation of repositories and exposes it to researchers via a search engine. These are the innovative functionalities:

  • Federation of data repositories:The Metadata Warehouse uses a multi-tiered schema with a top-level structure containing generic attributes and thematic sub-schemas that capture domain-specific details. It collects metadata from repositories using sector- or technology-specific crawlers that interact with repository interfaces. Crawlers for popular repository technologies will be co-developed with data providers in the project. Thanks to its plugin-based architecture, the system can easily integrate additional repositories. 
  • AI-based data discovery offers two types of search.
    The Basic Search enables keyword-based searches to retrieve datasets and tools related to the user’s input keywords, including tools for exploring or analysing a given dataset of interest.
    The Advanced Search uses the Metadata Warehouse’s Knowledge Graph and a Large Language Model (LLM) to turn natural language questions into structured queries. This enables semantic search and allows users to ask complex questions, such as “Which datasets related to Metabolomics were published in Switzerland in the last three years?”, enhancing data exploration capabilities.
  • Data FAIRness integrating FAIR assessment tools with the Metadata Warehouse to increase data quality. 

The Metadata Warehouse provides flexibility to define rich links between data, tools, and datasets. Linking data to tools enables “matchmaking,” recommending suitable analytical tools for a given dataset. Linking datasets supports data lineage tracking, making relationships between datasets explicit and machine-readable. These explicit connections will improve dataset recommendations and help reduce data duplication.

Data Preparation

The Data Preparation component includes the Catalogue of Tools, the Packaging Hub and the Request Packager. It defines all inputs and steps needed to start a data analysis and compiles them into a Package for processing datasets, which serves as the main output of the EOSC Matchmaker. It populates the Catalogue of Tools and links data to software, enriching the Metadata Warehouse with related analytical tools and provenance information.

  • Catalogue of tools: collects information on tools from existing registries. It extends these with attributes required to run analyses, such as hardware needs or launch commands. The Catalogue will also connect the EOSC EU Node to the future Tools Hub as an Application Provider.
  • Pairing data and software: the Packaging Hub links datasets with tools using two methods: (1) a deterministic approach based on an extended MIME type hierarchy, and (2) AI-based suggestions that propose or confirm matches depending on confidence levels. The AI model will be trained further using successful dataset-tool pairings from the EOSC Data Player.
  • Package for processing datasets: created by the Request Packager, this package includes all necessary protocols, URIs, and permissions to access both private and public datasets, along with the required code or applications. The Request Packager receives matching tools from the Packaging Hub and updates the Metadata Warehouse with the completed pairing.