ScholarGate
Explore
LibraryBookshelfDeskPreflightAssistant
Your tools
Compare
Build your library

Save methods, organize collections, and carry them to your desk.

Create account
Library / BrowseSearch the library…⌘K
Sign in
The library

Explore science by method, field & evidence.

One catalogue of research methods — learn how each one works, when to use it, and what it can’t do.

Search methods, fields, techniques…
8,178 methods11 fields7 method families40 languages
Science atlasMap the structure of science before you use it.Fields · methods · evidence routesExplore the map
FieldHealth & Medicine716Psychology570Business & Finance410Engineering330Life Sciences263Education261Research Practice248
ScholarGate

A content-first reference library for research methods — what each one is, how it works, and where it comes from.

Open data (CC-BY)

Explore

  • Library
  • Search the library…
  • Browse by field
  • Fields
  • Journey
  • Compare
  • Which method?

Reference

  • Subjects
  • Atlas
  • Glossary
  • Methodology
  • Philosophy

Your tools

  • Bookshelf
  • Desk
  • Chat

Company

  • About
  • Pricing
  • Contact
  • Suggest a method

Entries are compiled from published sources for reference. Verifying the accuracy and suitability of any information for your own use remains your responsibility.

© 2026 ScholarGate · A research-method reference library
  • Privacy
  • Cookies
  • Terms
  • Delete account
Natural Sciences236
Social Sciences185
Environment & Sustainability160
Law30
MethodStatistics1,836AI & ML1,661Decision Sciences932Research Methods1,354Measurement1,745Causal & Evidence532Research Practice118
150 methods in Education · MeasurementClear
Methods at the intersection of your two filters.
SortPopularityA–ZZ–ANewest
psychometrics

2PL IRT

The two-parameter logistic item response model, formalised by Frederic Lord (1980), describes the probability that a respondent answers a binary test item correctly as a smooth S-shaped function of the respondent's latent ability. By estimating a separate discrimination parameter for each item alongside a difficulty pa

2 sources1980
psychometrics

3PL IRT

The three-parameter logistic (3PL) model, introduced by Allan Birnbaum in 1968, is an item response theory model that describes the probability of a correct response to a binary test item as a function of three item-level parameters — difficulty, discrimination, and a lower asymptote representing guessing — and one per

2 sources1968
educational psychology

Academic Burnout Scale

The Academic Burnout Scale measures three dimensions of student burnout: emotional exhaustion, cynicism toward studies, and reduced academic efficacy. Developed by Schaufeli and colleagues in 2002, it adapts the Maslach Burnout Inventory framework to the academic context, providing researchers and educators with a vali

2 sources2002
educational psychology

Academic Help-Seeking Scale

The Academic Help-Seeking Scale measures students' inclination to seek academic help, their preferred sources of assistance (instructors, peers, tutors), and barriers that inhibit help-seeking (fear of judgment, embarrassment, preference for independence). Developed by Karabenick and colleagues in the 1990s, the AHSS r

2 sources1990
educational psychology

Academic Integrity Scale

The Academic Integrity Scale measures students' attitudes, values, and likelihood of engaging in academic dishonesty including cheating, plagiarism, and unauthorized collaboration. Multiple validated versions exist, each assessing different facets of academic integrity such as personal integrity commitment, perceived c

2 sources2000
educational psychology

Academic Motivation Scale

The Academic Motivation Scale (AMS) is a 28-item self-report instrument developed by Vallerand et al. (1992) to assess the quality of students' academic motivation. It distinguishes between intrinsic motivation (motivation for knowledge, accomplishment, and stimulation), extrinsic motivation (external regulation, intro

2 sources1992
educational psychology

Academic Resilience Scale

The Academic Resilience Scale measures the capacity of students to withstand and recover from academic adversity, including setbacks, failures, and difficult transitions. Developed by Cassidy in 2016, the ARS-30 conceptualizes resilience as a dynamic, multidimensional process involving perseverance, adaptive help-seeki

2 sources2016
educational psychology

Academic Self-Efficacy Scale

The Academic Self-Efficacy Scale (ASES) measures students' beliefs about their capability to succeed in academic tasks. Grounded in Bandura's social cognitive theory, the instrument assesses perceived competence in diverse academic domains—understanding lectures, completing assignments, performing on exams, and engagin

2 sources1977
psychometrics

Anchor-Based Minimal Important Difference

The anchor-based method for establishing Minimal Clinically Important Difference (MCID) is a technique for determining the smallest change in a patient-reported outcome (PRO) that patients or clinicians perceive as meaningful or important. Pioneered by Guyatt, Jaeschke, and Singer in 1989, this approach anchors changes

3 sources1989
education

Angoff Standard Setting

The Angoff method is a test-centered procedure for establishing a passing score (cut score) on an examination. A panel of content experts conceptualizes a 'borderline' or minimally competent examinee and, for each item, estimates the probability that such an examinee would answer it correctly. Summing those probabiliti

2 sources1971
psychometrics

Bifactor Model

The bifactor measurement model specifies that every indicator loads simultaneously on a single general factor and on one of several specific (group) factors. Formally introduced by Holzinger and Swineford in 1937 and brought into mainstream psychometrics by Reise (2012), it is now the standard tool for evaluating wheth

2 sources1937
education

Bookmark Standard Setting

The Bookmark method is an item-response-theory-based standard-setting procedure in which test items are arranged in a booklet ordered from easiest to hardest. Panelists page through this ordered item booklet and place a 'bookmark' at the point separating items a borderline examinee would likely master from those they w

2 sources2001
psychometrics

Case-Cohort Design

Case-cohort design is an epidemiological study design developed by Prentice (1986) that efficiently combines features of case-control and cohort studies. Researchers enroll an entire cohort, follow it for outcomes, then measure exposures only on cases and a random subcohort, reducing measurement costs while maintaining

3 sources1986
psychometrics

CAT Generalizability Theory

Generalizability theory (G-theory) applied to computerized adaptive testing (CAT) evaluates the dependability of adaptive test scores by decomposing score variance across measurement facets such as persons, items, and occasions. Unlike classical test theory, G-theory quantifies multiple simultaneous sources of measurem

2 sources1972
psychometrics

CAT McDonald's Omega

McDonald's omega adapted for computerized adaptive testing (CAT) quantifies the reliability of ability or trait estimates when different examinees answer different subsets of items. Unlike Cronbach's alpha, omega is grounded in a factor model, making it suitable for the heterogeneous item pools and variable test length

2 sources1999
psychometrics

CAT Scale Development

Computerized adaptive test (CAT) scale development is the process of constructing, calibrating, and validating a large item bank such that the assessment algorithm can select items tailored to each examinee's estimated ability or trait level in real time. The result is a measurement instrument that achieves high precis

2 sources1970
psychometrics

CAT-DIF

CAT-DIF identifies items in a computerized adaptive test that behave differently across demographic or group subpopulations after controlling for overall ability. Because adaptive algorithms select items non-randomly based on each examinee's estimated proficiency, standard DIF detection methods require adjustment befor

2 sources1990
educational psychology

Classroom Environment Scale

The Classroom Environment Scale is a comprehensive instrument measuring the social, emotional, and organizational climate of educational settings. Developed by Moos and Trickett in 1974, the CES assesses students' or teachers' perceptions of classroom relationships, instructional climate, and classroom management. By p

2 sources1974
psychometrics

Cognitive Diagnosis Model

Cognitive Diagnosis Models (CDMs) are a family of latent variable models designed to classify examinees according to their mastery of a set of discrete cognitive attributes or skills. The Generalized DINA (G-DINA) framework, introduced by Jimmy de la Torre in 2011, provides a unifying structure that encompasses many sp

1 source2011
psychometrics

Cognitive Diagnostic Computerized Adaptive Testing

Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines computerized adaptive testing (CAT) with cognitive diagnostic models (CDMs) to efficiently assess students' specific skill profiles. Rather than producing a single overall ability score, CD-CAT adaptively selects items to quickly identify which skills

3 sources2007
education

Cognitive Diagnostic Modeling

Cognitive diagnostic models (CDMs), also called diagnostic classification models, are restricted latent class models that report not a single ability score but a profile of which discrete skills or attributes a student has mastered. Each item is linked to the attributes it requires through a Q-matrix, and the model cla

2 sources2010
psychometrics

Computerized adaptive test construct validity

Construct validity in computerized adaptive testing evaluates whether the latent trait estimates produced by a CAT instrument genuinely measure the intended psychological or educational construct. Because adaptive algorithms select items individually for each examinee, the validity evidence gathered must account for th

2 sources1989
psychometrics

Computerized Adaptive Test Content Validity

Content validity in computerized adaptive testing (CAT) ensures that an adaptively administered assessment adequately samples the intended content domain despite delivering only a subset of items to each examinee. It integrates classical content validity methods with CAT-specific item bank design and content balancing

2 sources1975
psychometrics

Computerized Adaptive Test Convergent Validity

Convergent validity assessment for computerized adaptive tests (CATs) examines whether the ability or trait estimates produced by an adaptive algorithm correlate substantially with scores from other measures of the same construct. Because each examinee receives a different subset of items in a CAT, demonstrating that t

2 sources1989
psychometrics

Computerized adaptive test discriminant validity

Discriminant validity in computerized adaptive testing (CAT) is the evaluation process confirming that a CAT-administered scale measures its intended construct distinctly from related but conceptually different constructs. Despite the adaptive item-selection mechanism varying each respondent's item set, evidence must b

2 sources1959
psychometrics

Computerized adaptive test item response theory

Computerized adaptive testing based on item response theory is a sequential measurement procedure in which a computer algorithm selects successive test items tailored to each examinee's estimated ability level. Drawing on IRT to model item characteristics and ability estimation, CAT delivers precise scores with far few

2 sources1970
psychometrics

Computerized adaptive test measurement invariance

Computerized adaptive test measurement invariance evaluates whether a CAT instrument measures the same latent construct with the same psychometric properties across different groups (e.g., gender, language, clinical vs. community) or time points. It combines IRT-based adaptive test frameworks with measurement equivalen

2 sources1990
psychometrics

Computerized adaptive test Rasch model

Computerized adaptive testing with the Rasch model selects items in real time based on each examinee's evolving ability estimate, so that every person receives a test precisely calibrated to their proficiency level. The result is a shorter, more efficient measurement instrument that loses none of the precision of a ful

2 sources1960
psychometrics

Computerized adaptive test reliability analysis

CAT reliability analysis quantifies measurement precision in computerized adaptive tests where each examinee receives a unique, individually tailored subset of items. Rather than a single classical coefficient, it uses item response theory to express precision as conditional standard error of measurement at each abilit

2 sources1970
education

Conditional Standard Error of Measurement

The conditional standard error of measurement (CSEM) describes how much measurement error a test score carries at each point along the score scale, rather than as a single average. A test typically measures more precisely in some score ranges than others — often best near the middle and worst at the extremes — and the

2 sources1980
psychometrics

Construct Validity

Construct validity is the degree to which a test or scale actually measures the theoretical construct it is intended to measure. Introduced by Cronbach and Meehl in 1955, it is the central validity concern in psychological and educational measurement, evaluated by accumulating multiple lines of empirical and logical ev

2 sources1955
psychometrics

Content Validity

Content validity is evidence that a measurement instrument adequately samples the full domain of the construct it is intended to measure. It is established through systematic expert review and quantified with indices such as Lawshe's Content Validity Ratio (CVR) and Lynn's Content Validity Index (CVI), making it the fo

2 sources1975
psychometrics

Content Validity Ratio

The Content Validity Ratio (CVR) is a quantitative method developed by Charles Lawshe in 1975 for evaluating the extent to which items in a measurement instrument are relevant and representative of a target construct. The method aggregates expert panel judgments into a single validity coefficient for each item, enablin

3 sources1975
psychometrics

Convergent Validity

Convergent validity is the degree to which multiple indicators that are theoretically expected to measure the same construct actually correlate with one another. It is one of the two complementary forms of construct validity identified by Campbell and Fiske (1959) and is now routinely assessed via factor loadings and t

2 sources1959
educational psychology

Course Experience Questionnaire

The Course Experience Questionnaire (CEQ) is an institutional assessment tool measuring students' perceptions of their learning environment and educational experience in a course. Developed by Wilson, Lizzio, and Ramsden (1997), it assesses dimensions including good teaching, clear goals, appropriate workload, appropri

2 sources1997
education

Differential Distractor Functioning

Differential distractor functioning (DDF) extends test-fairness analysis from the correct answer to the wrong ones. It asks whether examinees of equal ability but different group membership are differentially attracted to particular distractors (incorrect options) of a multiple-choice item. By analyzing option-level ra

2 sources2008
psychometrics

Differential Item Functioning

Differential item functioning identifies test or survey items that behave differently for examinees from different groups — such as gender, ethnicity, or language background — after controlling for the underlying ability or trait being measured. DIF analysis is essential for fairness evaluation in educational testing a

2 sources1970
education

Differential Item Functioning in Educational Testing

Differential item functioning (DIF) analysis is the central statistical tool for evaluating the fairness of test items in education. An item shows DIF when examinees of equal ability but different group membership — for example by gender, race/ethnicity, or language background — have unequal probabilities of answering

2 sources1993
psychometrics

DINA Model

The DINA Model (Deterministic Inputs, Noisy Outputs) is a cognitive diagnostic model developed by Junker and Sijtsma (2001) that classifies examinees into latent skill classes based on their item response patterns. DINA assumes a deterministic relationship between skill mastery and correct responses, with probabilistic

3 sources2001
psychometrics

DINO Model

The DINO Model (Deterministic Inputs, Noisy Outputs—Disjunctive) is a cognitive diagnostic model that relaxes DINA's conjunctive (AND) skill requirement logic. DINO assumes an examinee only needs to master one of multiple possible skill pathways to answer an item correctly, making it suitable for scenarios where skills

3 sources2006
psychometrics

Discriminant Validity

Discriminant validity is evidence that a latent construct is empirically distinct from other constructs it should differ from. Originating in Campbell and Fiske's multitrait-multimethod framework (1959), it is a core component of construct validity and a mandatory check in scale development and structural equation mode

2 sources1959
psychometrics

Floor and Ceiling Effect

Floor and ceiling effects are psychometric phenomena in which a disproportionately large proportion of respondents achieve the lowest (floor) or highest (ceiling) possible score on a measurement scale. These effects compromise scale reliability and responsiveness, limiting the instrument's ability to distinguish among

3 sources2000
psychometrics

Fuzzy-Set Qualitative Comparative Analysis

Fuzzy-Set Qualitative Comparative Analysis (fsQCA) is a set-theoretic method developed by Charles Ragin in the early 2000s that combines the configurational logic of qualitative case studies with the mathematical rigor of fuzzy sets. It bridges qualitative and quantitative research by allowing researchers to examine ca

3 sources2000
psychometrics

G-Theory

Generalizability Theory, developed by Lee J. Cronbach and colleagues in the 1960s and formalised by Brennan (2001), is an ANOVA-based framework that extends Classical Test Theory by decomposing observed score variance into multiple, separately identified sources of measurement error — such as raters, tasks, occasions,

2 sources1963
psychometrics

Generalizability Theory

Generalizability Theory is a psychometric framework that decomposes observed score variance into multiple sources — persons, items, raters, occasions, and their interactions — using analysis of variance. It replaces the single reliability coefficient of classical test theory with a family of coefficients that tell rese

2 sources1963
psychometrics

Guttman Scale

Guttman scaling is a methodology for constructing unidimensional scales with a cumulative property, developed by Louis Guttman in 1944. The method assumes that items form a perfect or near-perfect hierarchy: if a respondent endorses a harder item, they must endorse all easier items below it. This creates a reproducible

3 sources1944
psychometrics

Item Analysis

Item analysis is the foundational psychometric procedure for evaluating the quality of individual test or scale items within the Classical Test Theory (CTT) framework, as systematised by Allen and Yen (1979) and Crocker and Algina (1986). It produces an item difficulty index, an item discrimination index, and a distrac

2 sources1979
psychometrics

Item Response Theory

Item response theory models the probability that a respondent answers an item correctly (or endorses it) as a function of the respondent's latent trait level and the item's own statistical properties — difficulty, discrimination, and guessing. Unlike classical test theory, IRT places persons and items on the same scale

2 sources1952
psychometrics

Likert Scale Construction

Likert scale construction is a systematic methodology for developing attitude measurement instruments using summated rating scales. Introduced by Rensis Likert in 1932, it enables researchers to quantify latent constructs such as attitudes, beliefs, and psychological states by aggregating responses across multiple item

3 sources1932
psychometrics

Longitudinal Construct Validity

Longitudinal construct validity evaluates whether a psychological scale measures the same latent construct in the same way across multiple time points. It is tested by progressively constraining a confirmatory factor model across waves and comparing model fit, ensuring that observed change scores reflect genuine change

2 sources1993
psychometrics

Longitudinal content validity

Longitudinal content validity evaluates whether the items of a measure adequately and consistently represent the intended content domain not only at a single point in time but across repeated administrations. It ensures that the conceptual coverage of a scale remains appropriate and stable as measurement occasions accu

2 sources1995
psychometrics

Longitudinal convergent validity

Longitudinal convergent validity evaluates whether a scale's indicators correlate with theoretically related constructs not just at a single time point but consistently across repeated measurement occasions. It extends standard convergent validity testing into longitudinal designs to ensure that the scale measures the

2 sources1997
psychometrics

Longitudinal DIF

Longitudinal differential item functioning detects whether individual test or scale items behave differently across measurement occasions for the same respondents. It extends standard DIF methodology to repeated-measures designs, ensuring that observed change scores genuinely reflect construct change rather than shifts

2 sources1980
psychometrics

Longitudinal Discriminant Validity

Longitudinal discriminant validity tests whether a psychological construct measured at two or more time points is empirically distinct across occasions — ensuring that the same construct does not collapse into a single undifferentiated mass over time. It is a prerequisite for meaningful change modeling in panel and lon

2 sources1993
psychometrics

Longitudinal Generalizability Theory

Longitudinal generalizability theory extends classical G-theory to repeated-measures and longitudinal designs, decomposing score variance across persons, measurement occasions, raters, and items simultaneously. It quantifies how reliably scores can be generalized across time points, evaluators, and conditions — informa

2 sources1990
psychometrics

Longitudinal IRT

Longitudinal IRT extends classical item response theory to data collected at multiple time points, allowing researchers to model both the initial latent trait level and its change over time. It is used in educational assessment, clinical trials, and panel studies where the same items or item banks are administered repe

2 sources1991
psychometrics

Longitudinal McDonald's omega

Longitudinal McDonald's omega estimates scale reliability separately at each measurement occasion in a panel or repeated-measures study. By fitting a confirmatory factor model at each wave, it tracks how consistently a set of items measures its target construct over time, detecting erosion or improvement in measurement

2 sources1999
psychometrics

Longitudinal Measurement Invariance

Longitudinal measurement invariance testing determines whether a psychological scale measures the same construct in the same way across two or more time points. It is a prerequisite for interpreting mean-level change scores in panel and repeated-measures studies, ensuring that observed change reflects true change in th

2 sources1993
psychometrics

Longitudinal Nomological Validity

Longitudinal nomological validity evaluates whether a construct's theoretically predicted relationships with other constructs hold consistently across multiple measurement occasions. It extends the nomological network framework of Cronbach and Meehl (1955) to longitudinal designs, testing whether a scale behaves as the

2 sources1955
psychometrics

Longitudinal Reliability Analysis

Longitudinal reliability analysis evaluates the consistency and stability of measurement instruments across two or more time points. It extends classical reliability concepts — internal consistency, test-retest stability, and measurement precision — to repeated-measures designs, ensuring that observed score changes ref

2 sources1951