ScholarGate
সহকারী

Comparative Genomics and Ortholog Inference

Much of what is known about human gene function was first learned in other organisms. Comparative genomics exploits this by comparing genomes across species, and ortholog inference identifies the corresponding genes — those descended from a common ancestral gene by speciation — so that functional knowledge can be transferred from model organisms to humans on a principled evolutionary basis.

PaperMind দিয়ে বিষয় খুঁজুনশীঘ্রইFind papers & topics
Tools & resources
স্লাইড ডাউনলোড করুন
Learn & explore
ভিডিওশীঘ্রই

Definition

Comparative genomics is the comparison of genome sequences and content across species to identify conserved and divergent features, and ortholog inference is the identification of orthologs — genes in different species that derive from a single gene in their last common ancestor through speciation — as distinct from paralogs, which arise by gene duplication.

Scope

This topic covers the evolutionary concepts that underlie cross-species comparison (homology, orthology, and paralogy), the methods used to infer orthologous relationships, and the role of these inferences in transferring function and studying conservation. It is a reference and educational subject and does not provide clinical guidance.

Core questions

  • What distinguishes orthologs from paralogs, and why does the difference matter for inferring function?
  • How are orthologous relationships inferred from sequence data?
  • When can function established in one species be transferred to another?
  • What can patterns of conservation and divergence reveal about gene function?

Key concepts

  • Homology, orthology, and paralogy
  • Speciation versus duplication events
  • Sequence alignment and similarity search
  • Reciprocal best hits and orthologous groups
  • Functional annotation transfer
  • Sequence conservation and constraint

Mechanisms

Comparative inference begins with sequence alignment, the operation that quantifies similarity between sequences, formalised by the Needleman-Wunsch global alignment algorithm and its successors. Because similarity alone does not distinguish the two kinds of homology, Fitch's distinction is central: orthologs diverge at a speciation event and tend to retain the ancestral function, whereas paralogs diverge by duplication and may acquire new roles. Inference methods operationalise this distinction — for example, by reciprocal best matches between genomes, by clustering proteins into orthologous groups as in the early Clusters of Orthologous Groups, or by graph-based clustering as in OrthoMCL. Once orthologs are identified, functional annotation can be transferred between species with appropriate caution, and the pattern of conservation across many genomes indicates which sequences are under functional constraint.

Clinical relevance

Ortholog inference is what allows discoveries in model organisms to inform understanding of human genes, and conservation across species is a widely used signal of functional importance in variant interpretation. The topic describes how cross-species relationships are inferred and used; it is reference orientation and not a basis for individual diagnostic or treatment decisions.

History

Walter Fitch's 1970 distinction between orthologs and paralogs gave evolutionary biology the vocabulary needed to reason about gene function across species, building on sequence-alignment methods such as Needleman-Wunsch (1970). As complete genomes became available in the late 1990s, the Clusters of Orthologous Groups framework (1997) systematised cross-genome comparison, and graph-clustering tools such as OrthoMCL (2003) extended ortholog inference to the more complex gene families of eukaryotes, making large-scale function transfer routine.

Debates

Does orthology reliably predict equivalent function?
Transferring function across species assumes orthologs retain the ancestral role, but duplication, loss, and divergence complicate this; distinguishing orthologs from paralogs and judging when function is conserved remain central methodological challenges.

Key figures

  • Walter Fitch
  • Eugene Koonin
  • David Lipman
  • David Roos

Related topics

Seminal works

  • fitch-1970
  • tatusov-1997
  • li-2003

Frequently asked questions

What is the difference between orthologs and paralogs?
Orthologs are genes in different species that descend from a single gene in their last common ancestor through speciation and often keep the same function; paralogs arise by gene duplication within a lineage and may take on new functions. The distinction matters because function is transferred more reliably between orthologs.
Why is sequence conservation across species used to interpret variants?
Positions that stay unchanged across many species are likely to be under functional constraint, so a variant at a highly conserved site is more likely to disrupt function than one at a position that varies freely between species.

Methods for this concept

Related concepts