Image
Type: Data Repository + Virtual Research Environment (VRE)

Gene Expression Enrichment with the TopAnat tool

Image
About

TopAnat is an anatomical expression enrichment tool that associates genes with the anatomical structures (e.g., brain) by their expression levels. TopAnat is a domain-specific data analysis tool in Life Sciences that could be integrated into a VRE such as Galaxy.
TopAnat relies on the Bgee database. Bgee integrates and curates multiple transcriptomics datasets produced with different technologies by researchers, including single-cell and bulk RNA-Seq. Bgee focuses on metazoans, and currently integrates 52 animal species.
The Bgee infrastructure is recognised as a Global Core Biodata Resource, i.e. a resource that is “of fundamental importance to the wider biological and life sciences community”
. At the European level, it is recognised as an ELIXIR Recommended Interoperability Resource, that is, a resource “that facilitates the FAIR-supporting activities in scientific research”.

Use Case Status Before Joining EOSC Data Commons

Use case scenario: a list of steps the users are expected to perform on TopAnat.

Analyse a list of genes associated with autism and epilepsy in humans. This gene list is either explicitly provided by the user or generated as a result of a data discovery process (e.g., in the future, EOSC Matchmaker could suggest this list: Autism spectrum-associated genes from Satterstrom et al., 2020).
This list typically includes gene symbols/names (e.g., INS represents the insulin gene). Because of this, our system is expected to perform a mapping task between the input gene IDs and the expected IDs by the TopAnat tool, including ID disambiguation. In addition to the gene list, the user can optionally provide a set of desired parameters to perform the analysis.
After executing TopAnat, it returns a list of anatomical structures where the expression of these genes is enriched relative to the background of a set of genes.
Notably, TopAnat can be slow due to the propagation of calls in the anatomical ontology, which is recalculated on the fly by the topGO package to enable decorrelation. TopAnat is available as a web tool and in the BgeeDB R package.

Objectives in the Project

  • Bgee data and the TopAnat tool are discoverable via the EOSC Data Commons services.
  • TopAnat is available as one tool to be paired with the EOSC pairing data and software. Optionally, Bgee data can be available too for pairing with other tools.
  • TopAnat is used to execute a domain-specific data analysis for gene expression enrichment, if the expected input data format is respected.
  • The FAIRness assessment service is applied to the Bgee repository data.
  • TopAnat integrated into a VRE such as Galaxy for deployment and data orchestration would be a plus. This may include gene ID mappers and the interoperation with other tools (i.e., data input or output) to perform an analysis workflow on the researcher’s data. We will also provide harmonised metadata related to the TopAnat tool where applicable.

Integration with EOSC Data Commons Services and Components

Expected Results
  • Bgee data and the TopAnat tool will be available and discoverable within the EOSC Matchmaker.
  • Integration of the TopAnat tool with the EOSC Data Player to execute Packages for Processing Datasets found by using the EOSC Matchmaker. For example, retrieve the necessary input data (for example, gene names associated with Alzheimer’s disease) and map them to the respective Ensembl identifiers expected to be able to run an analysis with TopAnat.
  • Apply the FAIR Assessment Toolkit to evaluate and improve the FAIRness of the Bgee repository data.
  • Create Packages for Processing Datasets to reproduce a well-known analysis. For example, data preparation is needed to run an Analysis with TopAnat : transform a gene list into a specific format and encoding (e.g. id mappings, TP53 => ENSG00000141510) and ensure consistency (e.g. genes of the same species).

Discover EOSC Data Commons Use Cases

Loading...