Educational Assessment and Learning Outcomes
Educational assessment is the process of gathering and interpreting evidence about what learners know and can do, against defined learning outcomes. It distinguishes assessment that supports learning (formative) from assessment that certifies achievement (summative), and it is judged by qualities such as validity, reliability, and educational impact.
Definition
Educational assessment is the systematic collection and interpretation of evidence of learning against intended outcomes, used either to support further learning (formative) or to make decisions about achievement and progression (summative); learning outcomes are the statements of what learners should be able to do that assessment is designed to measure.
Scope
This topic covers the purposes and qualities of assessment in health education, frameworks for what to assess, the contrast between formative and summative assessment, and the related idea of programme evaluation. It treats assessment as a methodological topic and is not a guide for grading or examining any specific course.
Core questions
- What is the purpose of a given assessment - to support learning or to certify it?
- Which level of competence does an assessment target?
- What makes an assessment valid, reliable, and defensible?
- How do individual assessments combine into a coherent programme?
Key concepts
- Formative and summative assessment
- Validity and reliability
- Learning outcomes and objectives
- Miller's pyramid of competence
- Workplace-based assessment
- Programmatic assessment
- Programme evaluation
Key theories
- Miller's pyramid
- A framework describing four ascending levels of clinical assessment - knows, knows how, shows how, and does - used to match assessment methods to the level of competence being judged.
- Programmatic assessment
- An approach that treats individual assessments as data points combined deliberately over time, optimising the whole programme for both learning and decision-making rather than relying on isolated high-stakes tests.
- Utility of assessment
- The view that the value of an assessment is a product of several qualities - validity, reliability, educational impact, acceptability, and cost - that must be balanced rather than maximised individually.
Mechanisms
Assessment is designed by matching the method to the purpose and to the level of competence being judged. Miller's pyramid (Miller, 1990) orders methods from testing knowledge (knows, knows how) to observing performance (shows how, does), so that, for example, written tests suit lower levels and workplace observation suits higher ones. The chosen methods are then appraised for utility - validity, reliability, impact on learning, acceptability, and cost - and combined, in programmatic approaches, into a deliberate sequence of low- and high-stakes data points that together support both learning and robust decisions (Epstein, 2007; Van der Vleuten et al., 2012). Programme evaluation extends the same logic to judging the educational programme itself (Frye & Hemmer, 2012).
Clinical relevance
Assessment shapes what learners study and how educators judge competence, so understanding its principles supports the design and critique of fair, defensible evaluation in health education. The topic describes how learning is measured and is not a basis for individual clinical decisions.
Evidence & guidelines
Assessment practice in the health professions is guided by widely cited frameworks - Miller's pyramid for matching methods to competence (Miller, 1990), the utility concept and reviews of assessment methods (Epstein, 2007), and programmatic assessment for combining evidence over time (Van der Vleuten et al., 2012). Programme evaluation draws on established models such as those summarised by Frye and Hemmer (2012). Much of this evidence is conceptual and consensus-based rather than experimental.
History
Assessment in the health professions shifted over the late twentieth century from a focus on knowledge testing toward the direct observation of performance, crystallised by Miller's 1990 pyramid. Subsequent decades emphasised the multidimensional utility of assessment, workplace-based methods, and - more recently - programmatic approaches that integrate many assessments over time rather than relying on single high-stakes examinations.
Debates
- Can validity and reliability be maximised at the same time?
- Authentic, performance-based assessments often gain validity at some cost to standardisation and reliability, so designers must balance the qualities of an assessment rather than optimise any one, a tension central to the utility concept and programmatic approaches.
Key figures
- George Miller
- Cees van der Vleuten
- Ronald Epstein
- Lambert Schuwirth
Related topics
Seminal works
- miller-1990
- epstein-2007
- vandervleuten-2012
Frequently asked questions
- What is the difference between formative and summative assessment?
- Formative assessment is intended to support and guide further learning through feedback, while summative assessment is used to certify achievement and make decisions such as passing or progression.
- What does Miller's pyramid describe?
- It describes four levels of clinical competence - knows, knows how, shows how, and does - and helps match the assessment method to the level of competence being evaluated.