Data Visualization
Data visualization is the graphical display of data so that its patterns, distributions, and relationships can be perceived directly. Well-chosen displays — histograms, box plots, scatter plots, and others — reveal features such as skew, clustering, and outliers that numerical summaries alone can conceal, making graphics an integral part of describing and exploring data.
Definition
Data visualization is the practice of representing data and statistical summaries graphically — through plots such as histograms, box plots, and scatter plots — to make distributional shape, comparison, and relationship visually apparent.
Scope
This entry covers the role of graphical display in summarising data, the principal chart types used in the health sciences, and the principles of graphical perception that make some displays more readable than others. It is a methodological reference and does not provide clinical guidance.
Core questions
- Which display best reveals the feature of the data in question — distribution, comparison, or relationship?
- How do the principles of graphical perception affect which encodings are read accurately?
- How can a chart mislead, and how is that avoided?
Key concepts
- Histogram
- Box plot
- Scatter plot
- Bar chart and frequency display
- Graphical perception and encoding accuracy
- Exploratory data analysis
- Misleading graphics
Key theories
- Graphical perception
- Cleveland and McGill's theory of graphical perception ranks the visual encodings (position, length, angle, area, colour) by how accurately people decode them, providing an empirical basis for preferring position-based displays such as dot and scatter plots over area- or angle-based ones such as pie charts.
Mechanisms
Different displays expose different features. A histogram shows the shape of a single distribution — its centre, spread, skew, and modality. A box plot compactly summarises the median, quartiles, and outliers, making it efficient for comparing the distribution of a variable across groups. A scatter plot reveals the relationship between two continuous variables. The effectiveness of any display rests on graphical perception: empirical study shows that the eye decodes some encodings (position along a common scale) far more accurately than others (angle, area, colour saturation), which is why position-based plots are generally preferred and why displays such as pie charts and three-dimensional effects are discouraged. Sound design also avoids distortions — truncated or inconsistent axes, excessive ornamentation — that can lead the reader to a false impression.
Clinical relevance
Figures carry much of the message in clinical papers and presentations, and the ability to read them critically — and to recognise misleading ones — is part of appraising evidence. This entry describes principles of graphical display for that purpose and is not a basis for individual diagnostic or treatment decisions.
Epidemiology
Graphical display is used at every stage of health research, from exploring raw data and checking distributional assumptions to communicating findings to clinical and public audiences. The choice and honesty of displays directly affect how clearly and accurately study results are understood.
History
Statistical graphics trace to the late eighteenth and nineteenth centuries in the work of William Playfair, who introduced the line, bar, and pie charts, and Florence Nightingale, who used graphics to argue for sanitary reform. The modern era was shaped by John Tukey's exploratory data analysis (1977), which introduced and popularised displays such as the box plot, by Cleveland and McGill's empirical study of graphical perception, and by Edward Tufte's principles for the honest and efficient display of quantitative information.
Debates
- Which displays should be preferred for accurate reading?
- Research on graphical perception shows that quantities encoded by position along a scale are judged more accurately than those encoded by angle or area, which underpins long-standing advice to favour dot, bar, and scatter plots and to avoid pie charts and three-dimensional decoration.
Key figures
- John W. Tukey
- William S. Cleveland
- Edward R. Tufte
Related topics
Seminal works
- tukey-1977
- cleveland-1984
- tufte-2001
- mcgill-1978
Frequently asked questions
- Why use a graph when summary statistics are already reported?
- Graphs reveal features — skew, multiple peaks, outliers, and relationships between variables — that single numbers such as the mean and standard deviation can hide, so they complement numerical summaries rather than replacing them.
- What makes one chart easier to read accurately than another?
- People decode position along a common scale more accurately than angle, area, or colour. Displays that rely on position, such as dot and scatter plots, are therefore generally read more reliably than pie charts or three-dimensional graphics.