What is the difference between poly-A selection and ribosomal RNA depletion?

Poly-A selection captures polyadenylated messenger RNA and is efficient for protein-coding transcripts, whereas ribosomal RNA depletion removes abundant rRNA while retaining non-polyadenylated and degraded transcripts. The choice determines which RNA species the experiment can measure.

Why are RNA-seq counts normalized before comparison?

Raw read counts depend on transcript length and on how deeply each sample was sequenced, so they cannot be compared directly. Normalization adjusts for these factors so that differences in abundance reflect biology rather than technical depth or length.

RNA Sequencing Methods and Technologies

RNA sequencing (RNA-seq) measures the transcriptome by converting RNA into a sequencing library and counting the resulting reads that map back to genes and transcripts. By sampling transcripts directly rather than hybridizing them to predefined probes, RNA-seq can quantify expression across a wide dynamic range, discover previously unannotated transcripts and splice junctions, and resolve isoform structure.

Definition

RNA sequencing is a high-throughput sequencing approach in which RNA is reverse-transcribed (or directly sequenced) and the resulting reads are mapped and counted to quantify transcript abundance and characterize transcript structure across the transcriptome.

Scope

This topic covers the experimental and computational workflow of RNA-seq: RNA selection and library preparation, short-read and long-read sequencing chemistries, read alignment or pseudo-alignment, and quantification and normalization of expression. It is a methodological reference within transcriptomics and does not provide clinical guidance.

Core questions

How is an RNA sample turned into a sequencing library, and what selection steps (poly-A or ribosomal depletion) shape what is measured?
How are reads aligned or pseudo-aligned to a reference, and quantified per gene or transcript?
How is normalization performed so that expression can be compared across samples?
What do long-read and direct-RNA technologies add over short-read sequencing?

Key concepts

Library preparation and poly-A selection or rRNA depletion
Short-read versus long-read sequencing
Read alignment and pseudo-alignment
Read counting and quantification
Normalization (e.g., units such as FPKM/TPM)
Sequencing depth and coverage
Splice-junction detection
Direct RNA sequencing

Mechanisms

In a typical workflow, RNA is extracted and a subset is selected (commonly poly-adenylated messenger RNA, or total RNA after ribosomal RNA depletion), fragmented, reverse-transcribed into complementary DNA, and built into an adapter-ligated library. The library is sequenced to produce millions of reads, which are aligned to a reference genome or transcriptome, or quantified by alignment-free pseudo-alignment. Reads overlapping each feature are counted; because longer transcripts and more deeply sequenced samples accrue more reads, counts are normalized for transcript length and library size before abundance is compared. Mortazavi and colleagues introduced the core counting-and-normalization framework, and reviews by Wang and by Ozsolak set out the platform's strengths and limitations. Long-read and direct-RNA technologies sequence full-length molecules, improving isoform resolution at the cost of higher per-read error and lower throughput.

Clinical relevance

RNA-seq generates much of the molecular-classification and biomarker-discovery evidence in research and translational genomics, and increasingly supports diagnostic transcript and fusion detection. As a reference topic it explains how transcriptomic evidence is produced; it is not a basis for individual diagnostic or treatment decisions.

Evidence & guidelines

Best-practice frameworks for RNA-seq derive from the foundational quantification work of Mortazavi and colleagues and from method reviews (Wang and colleagues; Ozsolak and Milos). Long-read analysis adds its own considerations, surveyed by Amarasinghe and colleagues. These are methodological references rather than clinical practice guidelines.

History

RNA-seq emerged in 2008 as high-throughput short-read sequencing was applied to reverse-transcribed RNA, with Mortazavi and colleagues establishing how to map and quantify mammalian transcriptomes from read counts. Reviews over the following years consolidated library-preparation and analysis standards, and from the mid-2010s long-read and direct-RNA platforms extended the approach toward full-length isoform sequencing.

Debates

Short-read versus long-read sequencing: Short reads offer high throughput and low per-base error but reconstruct isoforms only indirectly, whereas long reads sequence full-length transcripts and resolve isoforms directly at the cost of higher error rates and lower depth; the appropriate choice depends on whether accurate quantification or isoform structure is the priority.

Key figures

Barbara Wold
Ali Mortazavi
Michael Snyder

Seminal works

mortazavi-2008
wang-2009
ozsolak-2011

Frequently asked questions

What is the difference between poly-A selection and ribosomal RNA depletion?: Poly-A selection captures polyadenylated messenger RNA and is efficient for protein-coding transcripts, whereas ribosomal RNA depletion removes abundant rRNA while retaining non-polyadenylated and degraded transcripts. The choice determines which RNA species the experiment can measure.
Why are RNA-seq counts normalized before comparison?: Raw read counts depend on transcript length and on how deeply each sample was sequenced, so they cannot be compared directly. Normalization adjusts for these factors so that differences in abundance reflect biology rather than technical depth or length.