QSAR and Property Modeling
Quantitative structure-activity and structure-property relationships build statistical models that predict a molecule's activity or property from numerical descriptors of its structure.
Definition
Empirical, data-driven models that relate molecular structure, encoded as descriptors, to a measured property or biological activity for predictive purposes.
Scope
Covers the construction of QSAR and QSPR models, the descriptors and learning algorithms they use, the central importance of validation and the applicability domain, and applications to biological activity and to physicochemical and ADMET properties. Distinguishes interpretable classical models from modern machine-learned ones.
Core questions
- How is biological activity or a property correlated with molecular descriptors?
- How are QSAR models validated to ensure genuine predictivity?
- What is the applicability domain and why does it matter?
- How do classical QSAR and modern machine-learning models differ?
Key theories
- Hansch analysis
- Correlates biological activity with physicochemical descriptors such as lipophilicity and electronic and steric parameters, founding the quantitative structure-activity relationship.
- Validation and applicability domain
- Reliable QSAR requires rigorous external validation and a defined applicability domain, since models extrapolate poorly to structures unlike their training data.
Clinical relevance
QSAR and property models guide lead optimization, prioritize compounds for synthesis and testing, and predict absorption, distribution, metabolism, excretion, and toxicity, and they inform regulatory assessment of chemical safety.
History
Founded by Hansch and Fujita's 1964 analysis correlating activity with physicochemical parameters, QSAR grew through three-dimensional and machine-learning variants, with the OECD later codifying validation principles for regulatory use.
Debates
- Validation rigor and overfitting
- High internal fit statistics can mask poor real predictivity, so there is sustained emphasis on, and debate over, external validation and proper applicability-domain definition.
Key figures
- Corwin Hansch
- Toshio Fujita
- Alexander Tropsha
- Johann Gasteiger
Related topics
Seminal works
- hansch1964
- tropsha2010
Frequently asked questions
- What is the applicability domain of a QSAR model?
- It is the region of chemical space, defined by the training data, within which the model's predictions are considered reliable; predictions for very different molecules should be treated with caution.
- How is a QSAR model properly validated?
- Beyond internal cross-validation, it should be tested on an external set of compounds not used in training, since good internal statistics alone do not guarantee predictive performance.