What is the applicability domain of a QSAR model?

It is the region of chemical space, defined by the training data, within which the model's predictions are considered reliable; predictions for very different molecules should be treated with caution.

How is a QSAR model properly validated?

Beyond internal cross-validation, it should be tested on an external set of compounds not used in training, since good internal statistics alone do not guarantee predictive performance.

QSAR and Property Modeling

Quantitative structure-activity and structure-property relationships build statistical models that predict a molecule's activity or property from numerical descriptors of its structure.

Definition

Empirical, data-driven models that relate molecular structure, encoded as descriptors, to a measured property or biological activity for predictive purposes.

Scope

Covers the construction of QSAR and QSPR models, the descriptors and learning algorithms they use, the central importance of validation and the applicability domain, and applications to biological activity and to physicochemical and ADMET properties. Distinguishes interpretable classical models from modern machine-learned ones.

Core questions

How is biological activity or a property correlated with molecular descriptors?
How are QSAR models validated to ensure genuine predictivity?
What is the applicability domain and why does it matter?
How do classical QSAR and modern machine-learning models differ?

Key theories

Hansch analysis: Correlates biological activity with physicochemical descriptors such as lipophilicity and electronic and steric parameters, founding the quantitative structure-activity relationship.
Validation and applicability domain: Reliable QSAR requires rigorous external validation and a defined applicability domain, since models extrapolate poorly to structures unlike their training data.

Clinical relevance

QSAR and property models guide lead optimization, prioritize compounds for synthesis and testing, and predict absorption, distribution, metabolism, excretion, and toxicity, and they inform regulatory assessment of chemical safety.

History

Founded by Hansch and Fujita's 1964 analysis correlating activity with physicochemical parameters, QSAR grew through three-dimensional and machine-learning variants, with the OECD later codifying validation principles for regulatory use.

Debates

Validation rigor and overfitting: High internal fit statistics can mask poor real predictivity, so there is sustained emphasis on, and debate over, external validation and proper applicability-domain definition.

Key figures

Corwin Hansch
Toshio Fujita
Alexander Tropsha
Johann Gasteiger

Seminal works

hansch1964
tropsha2010

Frequently asked questions

What is the applicability domain of a QSAR model?: It is the region of chemical space, defined by the training data, within which the model's predictions are considered reliable; predictions for very different molecules should be treated with caution.
How is a QSAR model properly validated?: Beyond internal cross-validation, it should be tested on an external set of compounds not used in training, since good internal statistics alone do not guarantee predictive performance.