Machine Learning for Chemistry
Machine learning models trained on chemical data and quantum-chemical calculations predict molecular properties, energies, and reactions, complementing and accelerating traditional computational chemistry.
Definition
The application of statistical learning and neural-network models to chemical problems, learning mappings from molecular representations to properties, energies, or new structures.
Scope
Covers data-driven models in chemistry: machine-learned interatomic potentials that approximate quantum-chemical energies at force-field speed, graph and message-passing neural networks for molecules, generative models for molecular design, and the challenges of data quality, representation, and extrapolation beyond training data.
Core questions
- How can learned potentials reproduce quantum accuracy at a fraction of the cost?
- How do graph neural networks operate directly on molecular structure?
- How are generative models used to propose novel molecules?
- How is generalization beyond the training distribution assessed and ensured?
Key theories
- Machine-learned interatomic potentials
- Neural-network potentials trained on quantum-chemical reference data reproduce energies and forces across configurations, enabling simulations with near-quantum accuracy at near-classical cost.
- Message-passing on molecular graphs
- Graph neural networks propagate information between bonded atoms to learn representations directly from molecular structure, achieving strong property prediction without hand-crafted descriptors.
Clinical relevance
Machine learning is reshaping computational chemistry by accelerating property and energy prediction, expanding the reach of simulation through learned potentials, and enabling generative design of molecules and materials.
History
Neural-network potentials introduced by Behler and Parrinello in 2007 and the rise of graph neural networks from the mid-2010s, together with large reference datasets, drove rapid growth of machine learning across molecular and materials chemistry.
Debates
- Generalization and data requirements
- Whether learned models extrapolate reliably to chemistry outside their training data, and how much and what kind of data are needed for trustworthy predictions, remain central open questions.
Key figures
- Jörg Behler
- Michele Parrinello
- Anatole von Lilienfeld
- Aspuru-Guzik
Related topics
Seminal works
- behler2007
- gilmer2017
Frequently asked questions
- Will machine learning replace quantum chemistry?
- Not entirely; learned models depend on quantum-chemical or experimental reference data for training and are best seen as accelerators and complements rather than replacements for first-principles methods.
- What is a machine-learned interatomic potential?
- It is a model trained to reproduce the energies and forces from quantum calculations, allowing molecular dynamics with accuracy approaching quantum methods but at greatly reduced cost.