EOSC Data Player

One platform for unified, automated data analysis

EOSC Data Player simplifies and automates complex data analytics across distributed computing environments, providing researchers with seamless access, processing, and reuse of data, all without dealing with technical infrastructure.

It bridges the gap between data sources, compute engines, and reproducible analysis workflows in the EOSC ecosystem.

Name

EOSC Data Player

What For

Simplify data analytics across the EOSC ecosystem

Intended Users

Experts and developers with knowledge about the Package for Processing Datasets

URL

Access the Service

EOSC Data Player operates behind the scenes of EOSC Data Commons services. It runs analytics tools and accesses data through unified interfaces. A plugin-based dispatcher distributes tasks across diverse computing resources and supports automated, on-demand orchestration of computing services.

EOSC Data Player supports key stages of the research data lifecycle:

Data Collection

Facilitates transparent access to existing datasets from heterogeneous repositories and user-specific sources.

Data Processing and Analysis

Automates the execution of data processing and analysis workloads across heterogeneous distributed compute resources.

Data Sharing and Reuse

Automates reproducibility of analysis workflows and supports the reuse of data in data analytics.

EOSC Data Player combines deployment tools, orchestrators, Virtual Research Environments (VREs), cloud and container frameworks, and data access services across the full computing continuum (Cloud, HTC, HPC).

Its dispatcher matches each Package for Processing Datasets with the most suitable platform to execute the corresponding analysis.

By integrating with existing engines and research community platforms, EOSC Data Player interprets the instructions in the Package for Processing Datasets (provided by the EOSC Matchmaker) and executes analyses automatically. The system hides technical complexity from users through an extensible plugin architecture that connects various compute engines and data access platforms.

The Dispatcher is the main entry point. It parses incoming packages, forwards them to the right compute engine, and coordinates data access through the Data Access layer. Depending on user needs, it can interact with existing engines or automatically deploy new ones.

EOSC Data Commons will provide plugins for widely used compute engines and data access tools. The Data Access layer offers libraries that resolve dataset references and make them accessible to the engines. It includes a lightweight file system (FUSE/PyFilesystem) that exposes data as an RO-Crate, enabling easy packaging and use. Data access is plugin-based and extensible, with current and planned support for local files, web URLs, S3 object stores, and other data management solutions as required by different use cases.