RNA Sequencing Methods and Technologies
RNA sequencing (RNA-seq) measures the transcriptome by converting RNA into a sequencing library and counting the resulting reads that map back to genes and transcripts. By sampling transcripts directly rather than hybridizing them to predefined probes, RNA-seq can quantify expression across a wide dynamic range, discover previously unannotated transcripts and splice junctions, and resolve isoform structure.
Definition
RNA sequencing is a high-throughput sequencing approach in which RNA is reverse-transcribed (or directly sequenced) and the resulting reads are mapped and counted to quantify transcript abundance and characterize transcript structure across the transcriptome.
Scope
This topic covers the experimental and computational workflow of RNA-seq: RNA selection and library preparation, short-read and long-read sequencing chemistries, read alignment or pseudo-alignment, and quantification and normalization of expression. It is a methodological reference within transcriptomics and does not provide clinical guidance.
Core questions
- How is an RNA sample turned into a sequencing library, and what selection steps (poly-A or ribosomal depletion) shape what is measured?
- How are reads aligned or pseudo-aligned to a reference, and quantified per gene or transcript?
- How is normalization performed so that expression can be compared across samples?
- What do long-read and direct-RNA technologies add over short-read sequencing?
Key concepts
- Library preparation and poly-A selection or rRNA depletion
- Short-read versus long-read sequencing
- Read alignment and pseudo-alignment
- Read counting and quantification
- Normalization (e.g., units such as FPKM/TPM)
- Sequencing depth and coverage
- Splice-junction detection
- Direct RNA sequencing
Mechanisms
In a typical workflow, RNA is extracted and a subset is selected (commonly poly-adenylated messenger RNA, or total RNA after ribosomal RNA depletion), fragmented, reverse-transcribed into complementary DNA, and built into an adapter-ligated library. The library is sequenced to produce millions of reads, which are aligned to a reference genome or transcriptome, or quantified by alignment-free pseudo-alignment. Reads overlapping each feature are counted; because longer transcripts and more deeply sequenced samples accrue more reads, counts are normalized for transcript length and library size before abundance is compared. Mortazavi and colleagues introduced the core counting-and-normalization framework, and reviews by Wang and by Ozsolak set out the platform's strengths and limitations. Long-read and direct-RNA technologies sequence full-length molecules, improving isoform resolution at the cost of higher per-read error and lower throughput.
Clinical relevance
RNA-seq generates much of the molecular-classification and biomarker-discovery evidence in research and translational genomics, and increasingly supports diagnostic transcript and fusion detection. As a reference topic it explains how transcriptomic evidence is produced; it is not a basis for individual diagnostic or treatment decisions.
Evidence & guidelines
Best-practice frameworks for RNA-seq derive from the foundational quantification work of Mortazavi and colleagues and from method reviews (Wang and colleagues; Ozsolak and Milos). Long-read analysis adds its own considerations, surveyed by Amarasinghe and colleagues. These are methodological references rather than clinical practice guidelines.
History
RNA-seq emerged in 2008 as high-throughput short-read sequencing was applied to reverse-transcribed RNA, with Mortazavi and colleagues establishing how to map and quantify mammalian transcriptomes from read counts. Reviews over the following years consolidated library-preparation and analysis standards, and from the mid-2010s long-read and direct-RNA platforms extended the approach toward full-length isoform sequencing.
Debates
- Short-read versus long-read sequencing
- Short reads offer high throughput and low per-base error but reconstruct isoforms only indirectly, whereas long reads sequence full-length transcripts and resolve isoforms directly at the cost of higher error rates and lower depth; the appropriate choice depends on whether accurate quantification or isoform structure is the priority.
Key figures
- Barbara Wold
- Ali Mortazavi
- Michael Snyder
Related topics
Seminal works
- mortazavi-2008
- wang-2009
- ozsolak-2011
Frequently asked questions
- What is the difference between poly-A selection and ribosomal RNA depletion?
- Poly-A selection captures polyadenylated messenger RNA and is efficient for protein-coding transcripts, whereas ribosomal RNA depletion removes abundant rRNA while retaining non-polyadenylated and degraded transcripts. The choice determines which RNA species the experiment can measure.
- Why are RNA-seq counts normalized before comparison?
- Raw read counts depend on transcript length and on how deeply each sample was sequenced, so they cannot be compared directly. Normalization adjusts for these factors so that differences in abundance reflect biology rather than technical depth or length.
Methods for this concept
- RNA-seq Differential Expression
- Single-cell RNA-seq analysis
- De Novo Transcriptome Assembly
- Bayesian RNA-seq differential expression
- Time-series single-cell RNA-seq analysis
- Differential single-cell RNA-seq analysis
- Single-cell RNA-seq differential expression
- Machine learning-assisted RNA-seq differential expression