Type: Data Repository + Virtual Research Environment (VRE)

DaSCH Service Platform

DaSCH (Swiss National Data and Service Center for the Humanities)

About

The DaSCH Service Platform (DSP) is an open-source, standards-based infrastructure developed by the Swiss National Data and Service Centre for the Humanities (DaSCH). It is designed for the long-term preservation, management, and reuse of complex humanities research data, particularly those with intricate internal structures. DSP-APP is the web-based user interface of the VRE, which enables researchers to create and manage data models, to search, browse and edit the data, and to annotate and link resources. DSP-API is a RESTful API that manages data storage and retrieval. Non-binary data is stored using the Resource Description Framework (RDF) in a dedicated triplestore, facilitating complex data structures and relationships. Binary media files (e.g., images, audio, video, documents) are converted to specialised archival formats and stored using the IIIF-compatible media server SIPI, with metadata maintained in the triplestore.

Use Case Status Before Joining EOSC Data Commons

Data quality poses a significant hurdle, as humanities research data arrives in highly heterogeneous formats with substantial quality issues, demanding extensive manual intervention. Each project typically necessitates 2-6 weeks of dedicated data specialist time for cleansing and transformation. Furthermore, there is a distinct lack of integrated tooling to assess data FAIRness and no OAI-PMH API for metadata harvesting. The reliance on manual processes for data cleansing is heavy, particularly for digital scholarly editions delivered as Word documents, unstructured bibliographies, inconsistent naming conventions, and free text fields requiring controlled vocabulary. Finally, the Virtual Research Environment (VRE) and repository functionalities are merged into a single system, which limits specialised optimisation.

Objective in the Project

Implement AI-powered tools for controlled vocabulary creation from free text fields, disambiguation and Named Entity Recognition capabilities, consistency checking mechanisms, fuzzy matching for linking tables and fixing variations, and pattern recognition in semi-structured strings like bibliographies and separated files.
Implement the FAIR Assessment Toolkit and ensure continuous FAIRness evaluation and improvement.
Extend data and metadata APIs coverage by implementing the OAI-PMH API for metadata harvesting and enhancing data access APIs for broader interoperability.
Full integration with EOSC Matchmaker for data discovery, integration with EOSC Data Player for processing workloads, and support for Packages for Processing Datasets.

Integration with EOSC Data Commons Services and Components

Expected Results

The EOSC Matchmaker aims to make DaSCH repository data findable by exposing metadata through standardized APIs for harvesting, thereby enhancing the discoverability of humanities research data across EOSC.
The EOSC Data Player will pair DaSCH data with tools from EOSC Matchmaker and execute Packages for Processing Datasets. This integration involves implementing the capability to send workloads for remote processing, enabling advanced analytics on DaSCH data without local processing constraints.
Finally, the FAIR Assessment Toolkit is designed to evaluate and improve the FAIRness of hosted data by embedding assessment tools within the data lifecycle, leading to continuous FAIRness monitoring and improvement recommendations.

Technical Integration Plan

Timeline

By November 2025

The initial phase will delve into a comprehensive analysis of metadata standards pertinent to humanities data. This includes an evaluation of OAI-PMH implementation strategies and metadata crosswalks, an assessment of existing FAIR evaluation frameworks tailored for humanities repositories, and an exploration of AI-based tools for data cleansing within cultural heritage contexts.

By June 2026

This release will introduce a foundational OAI-PMH endpoint implementation and initiate EOSC Matchmaker metadata exposure. Additionally, it will involve the application of the FAIR Assessment Toolkit to 3-5 sample projects and the development of a prototype AI-based data cleansing tool for a specific use case, such as bibliography parsing.

By January 2028

This will mark the deployment of a production-ready OAI-PMH API, capable of supporting multiple metadata formats. It will also see the full integration of EOSC Matchmaker and the FAIR Assessment Toolkit into the data curation workfl ow. Furthermore, AI-powered cleansing tools will be available to address 2-3 common data quality issues. Initial integration of the EOSC Data Player for basic processing tasks is planned, alongside a preliminary separation of VRE/Repository if deemed feasible.

Discover EOSC Data Commons Use Cases