Image
Type: Workflow

FAIR Molecular Dynamics: Biomolecular Workflows with Metal Ion Precision

About

This use case aims to implement workflows designed to simulate the dynamics of biomolecular systems, with a particular emphasis on complex scenarios such as protein folding/unfolding, ligand binding, and small molecule diffusion into active sites. It should permit robustly handling multimolecular adducts, capturing transient and cooperative interactions between multiple biomolecules.
Special attention is given to metal ion coordination sites, incorporating accurate, possibly quantum-derived, parameters to reflect their unique electronic and bonding interactions. Non-standard residues, cofactors, and ligands are parameterized using advanced techniques that ensure compatibility with classical force fields.
The workflow integrates automated topology generation to allow users to address a wide range of molecular systems. Validation against experimental data ensures the reliability and transferability of the simulations across diverse biological contexts.

Use Case Status Before Joining EOSC Data Commons

MDDB initiative, running for a couple of years, agrees on an elementary metadata schema for MD simulations data and integrates a few repositories under a common search and retrieval API. The MDDash portal, recently developed at CESNET, offers a Kubernetes-based environment for running MD simulations. The dashboard provides several methods for retrieving input data. Simulations are set up in a Jupyter notebook, which is included in a provenance record of the results. Optionally, an automated search for optimal simulation parameters (MPI processes, OMP threads, GPU usage, …) can be run. Finally, a production simulation is run, and the results are published to EOSC repositories. The associated metadata can be enriched automatically with the Metadump tool. Current implementation supports Gromacs as the MD engine, but the whole architecture allows extensions with others.

Objectives in the Project

  • Design an intuitive user interface (GUI/CLI) that simplifies system setup, parameter selection, and simulation launch, with guided workflows for common tasks.
  • Automate complex parameterization steps, including detection and treatment of non-standard residues, metal-ion sites, and multimolecular assemblies.
  • Implement standardized input/output formats and modular architecture to facilitate exchange with different simulation engines.
  • Provide accessible documentation and training resources, including tutorials, example cases, and troubleshooting guides for users at all experience levels.
  • Allow seamless deposition of parameters, data, results.

Integration with EOSC Data Commons Services and Components

Expected Results

EOSC Matchmaker
Matchmaker will be able to discover MD simulation datasets published in the MDDB-compliant repositories. Eventually, options to discover datasets published in generic repositories (Zenodo etc.) and described in literature will be explored.

EOSC Data Player
The Dispatcher plugin will be developed to communicate with MDDash instances, setting up simulations and uploading input data. Options to offload large-scale production simulations to HPC resources via interLink will be explored. Technical means for efficient transfer of possibly huge MD datasets will be explored; MDDash already uses OneData to publish results to selected EOSC repositories. Eventual issues with user authentication and credential deletion will be identified and resolved.

Package for processing datasets
The exact content of the packages for the MD use case will be defined. These can include existing simulation results, as well as plain input data (e.g. protein structure) to start simulations from scratch.

The workflow will enable accurate, reproducible simulations of complex biomolecular systems, including those with metal-ion centres and non-standard components. Users will be able to set up and launch simulations efficiently, with minimal manual intervention, across a wide range of biological scenarios.

Discover EOSC Data Commons Use Cases

Loading...