Downloads

All data in Rhea is freely available and can be downloaded from our FTP site in different formats. The complete current and previous releases (starting from release 100) can also be downloaded as tar archives.

Content:

Reactions

Reactions are available in these data formats:

  • BioPAX level 3
    This is a community standard data exchange format for biological pathway data in an OWL RDF/XML serialization. It covers the core Rhea data types, but some aspects of Rhea cannot be expressed in BioPAX (e.g. residues of Rhea macromolecules or polymerization indexes of Rhea polymers) and these are added as "bp:COMMENT".
  • RXN
    This is a MDL CT file format that represents unidirectional processes. For this reason, bidirectional reactions and reactions with undefined directions cannot be described in this format.
  • RD
    This is a MDL CT file format that consists of a set of records, each of them defining a reaction - in RXN format - and any associated data.
  • TSV (tab-separated values)

Reaction participants

Information about the chemical entities that participate in Rhea reactions is available in the following files:

  • chebiId_name.tsv: A list of the participants (small molecules only) that are used in Rhea reactions as a tab-separated values file with the columns 1. participant ID, 2. participant name.
  • chebi.owl.gz: A representation of the ChEBI data in the Web Ontology Language (OWL) format, that can also be queried directly at the Rhea SPARQL endpoint. Please note that the file contains a snapshot of the ChEBI data that is synchronized with the Rhea (and UniProt) release cycle, and the OWL representation differs slightly from ChEBI's OWL model: It lacks Axiom about most synonyms, but has additional properties to faciliate queries by a) the participant names that are used in Rhea (http://purl.uniprot.org/core/name) and b) compounds with different protonation states (http://purl.obolibrary.org/obo/chebi#has_major_microspecies_at_pH_7_3).
  • TSV (tab-separated values)
    • rhea-chebi-smiles.tsv: Canonical SMILES for the subset of ChEBI used in Rhea, computed with RDKit using the ChEBI Molfile as input (beta release).

Cross-references

Rhea cross-references to other databases are available as tab-separated values (TSV) files:

All Rhea TSV files (except UniProtKB) can be downloaded in a single archive: rhea-tsv.tar.gz

NLP datasets

The curated EnzChemRED dataset is available as a set of BioC files, which can be downloaded in a single archive: EnzChemRED.tar.gz