ScholarGate
Assistente

Gene Ontology and Biological Databases

Interpreting genomes at scale requires a shared, machine-readable language for what genes do. The Gene Ontology provides that language — a structured vocabulary of molecular functions, biological processes, and cellular locations — while curated databases such as KEGG and Reactome supply the pathway and reaction knowledge against which genomic results are read.

Trova un argomento con PaperMindIn arrivoFind papers & topics
Tools & resources
Scarica le diapositive
Learn & explore
VideoIn arrivo

Definition

The Gene Ontology is a structured, hierarchical controlled vocabulary describing gene-product attributes across three domains — molecular function, biological process, and cellular component — and biological databases are curated repositories (such as KEGG, Reactome, and protein-association resources) that store functional, pathway, and interaction knowledge used to annotate and interpret genomic data.

Scope

This topic covers controlled biological vocabularies and the major knowledge bases that store curated functional and pathway information: the structure and use of the Gene Ontology, how genes are annotated to ontology terms with evidence codes, and the role of pathway and interaction databases. It is a reference and educational subject and does not provide clinical guidance.

Core questions

  • How can the function of a gene product be described in a consistent, computable way?
  • What do the three Gene Ontology domains capture, and how are they organised?
  • How is the strength of an annotation indicated, for example through evidence codes?
  • Which databases hold pathway, reaction, and interaction knowledge, and how do they differ?

Key concepts

  • Controlled vocabulary and ontology
  • Molecular function, biological process, cellular component
  • Directed acyclic graph (DAG) structure of GO
  • Annotation and evidence codes
  • Pathway databases (KEGG, Reactome)
  • Protein interaction and association databases (STRING)

Mechanisms

The Gene Ontology organises terms as a directed acyclic graph in which specific terms inherit from more general ones across three independent domains: molecular function (the biochemical activity of a gene product), biological process (the larger programme it contributes to), and cellular component (where it acts). Genes are linked to terms by annotations, each tagged with an evidence code that records whether the support is experimental, computational, or curator-inferred. Complementary databases capture knowledge the ontology does not: KEGG and Reactome encode pathways as networks of reactions and relations, and protein-association resources such as STRING aggregate evidence of functional links between proteins. Together these resources provide the curated gene sets and reference annotations that downstream enrichment and network methods consume.

Clinical relevance

Ontologies and curated databases are the shared infrastructure that makes genomic interpretation reproducible across studies, supplying the vocabulary and gene sets used in annotation, enrichment, and network analysis. They describe how biological knowledge is organised for computation and serve as reference resources rather than as a basis for individual diagnostic or treatment decisions.

History

The Gene Ontology was launched in 2000 by a consortium of model-organism databases to unify how gene function was described across species, and it became the de facto standard vocabulary for functional genomics. In the same year KEGG formalised pathway knowledge as computable maps, and Reactome later added a manually curated, reaction-level pathway knowledgebase. Protein-association databases such as STRING extended curation to functional and physical interactions, completing an ecosystem of resources on which most enrichment and network analyses now depend.

Key figures

  • Michael Ashburner
  • Judith Blake
  • Minoru Kanehisa
  • Peter D'Eustachio

Related topics

Seminal works

  • ashburner-2000
  • kanehisa-2000
  • jassal-2020

Frequently asked questions

What are the three domains of the Gene Ontology?
Molecular function (the biochemical activity of a gene product), biological process (the broader programme it contributes to), and cellular component (where in the cell it acts). These three domains are organised independently.
Why do Gene Ontology annotations carry evidence codes?
Evidence codes record how an annotation was supported — for example experimental evidence versus computational inference — so that users can judge how reliable a given gene-to-term assignment is.

Methods for this concept

Related concepts