Copy Number Variants: Detection and Classification
A copy-number variant (CNV) is a segment of DNA that is present in a different number of copies between individuals—gained through duplication or lost through deletion relative to a reference genome. CNVs are a major component of structural variation, and the central methodological questions are how to detect them reliably from array or sequencing data and how to classify them by size, copy state, and likely significance.
Definition
A copy-number variant is a DNA segment, conventionally of one kilobase or larger, that differs in the number of copies present compared with a reference genome, arising as a deletion (copy loss) or a duplication or higher-order amplification (copy gain).
Scope
This topic covers what a CNV is, the principal technologies used to detect and size them (array comparative genomic hybridization, SNP arrays, and read-depth or paired-end signals from sequencing), and the bases on which they are classified—gain versus loss, copy number, recurrence, and frequency. It is a reference treatment of detection and classification concepts and does not provide diagnostic interpretation for individuals.
Core questions
- What distinguishes a copy-number variant from other structural variants?
- Which signals—hybridization intensity, read depth, paired-end and split-read evidence—are used to call CNVs?
- How are CNVs classified by copy state, size, recurrence, and population frequency?
- What are the resolution limits and false-call sources of each detection platform?
Key concepts
- Copy gain (duplication) and copy loss (deletion)
- Array comparative genomic hybridization (aCGH)
- SNP array log R ratio and B-allele frequency
- Read-depth and paired-end detection
- Breakpoint resolution
- Recurrent vs. non-recurrent CNV
- Population frequency and benign vs. pathogenic classification
Mechanisms
CNV detection translates a physical change in DNA dosage into a measurable signal. Array comparative genomic hybridization and SNP arrays read relative hybridization intensity, so a deletion lowers and a duplication raises the signal across the affected interval; SNP arrays add allele-ratio information that helps distinguish copy states. Sequencing-based approaches infer copy number from read depth—more reads accumulate over duplicated regions and fewer over deletions—and use discordant paired-end and split-read alignments to localize breakpoints. Classification then combines copy state, size, whether the variant recurs at architecture-defined breakpoints, and its frequency in reference populations.
Clinical relevance
Copy-number analysis is widely used in the health sciences to characterize genomic gains and losses, and distinguishing common benign CNVs from rare dosage-altering events is central to interpreting genomic data. This entry describes how CNVs are detected and categorized as a methodological matter; it is not a basis for individual diagnosis or management.
Epidemiology
Early genome-wide surveys established that CNVs are common in healthy individuals: Sebat and colleagues first showed widespread copy-number polymorphism, and Redon and colleagues mapped global CNV across HapMap populations. Subsequent sequencing-based catalogues, including the 1000 Genomes structural-variation map, refined frequencies and showed that deletions and duplications collectively span a large portion of the variable genome.
History
The recognition that copy number varies widely among healthy people emerged in 2004 from array studies by Sebat and by Iafrate and colleagues, overturning the assumption that such variation was rare. Whole-genome CNV maps from array platforms followed in 2006, and the shift to high-throughput sequencing in the following decade brought read-depth and paired-end methods that improved breakpoint resolution and merged CNV calling into general structural-variant discovery.
Debates
- How should detection-platform differences be reconciled?
- Arrays and sequencing-based callers report overlapping but non-identical CNV sets, differing in size resolution, breakpoint precision, and sensitivity in repetitive regions, so harmonizing calls and frequencies across platforms remains a recognized methodological challenge.
Key figures
- Jonathan Sebat
- Stephen W. Scherer
- Charles Lee
- Evan E. Eichler
- Nigel P. Carter
Related topics
Seminal works
- sebat-2004
- redon-2006
- alkan-2011
Frequently asked questions
- What is the difference between a CNV and a deletion?
- A deletion is one kind of copy-number variant (a copy loss). The term CNV is broader and also includes duplications and higher-order copy gains, so every deletion of the relevant size is a CNV but not every CNV is a deletion.
- Why can two platforms report different CNVs for the same sample?
- Array and sequencing methods differ in resolution, breakpoint precision, and sensitivity within repetitive regions, so they capture overlapping but not identical sets of variants and may size the same event differently.
Methods for this concept
- Copy Number Variation Analysis
- Machine learning-assisted copy number variation analysis
- Bayesian Copy Number Variation Analysis
- Differential Copy Number Variation Analysis
- Single-cell Copy Number Variation Analysis
- Variant Calling
- Time-series copy number variation analysis
- Network-based copy number variation analysis