Exploring beyond experiment: generating high-quality datasets of transition metal complexes with quantum chemistry and machine learning

Jacob W. Toney, Aaron Garrison, Weiliang Luo, Roland St. Michel, Sukrit Mukhopadhyay, Heather J. Kulik

October 2025

Abstract

Machine learning (ML) approaches enable screening of the vast chemical space of transition metal complexes (TMCs) at faster speeds than either experimental approaches or ab initio calculations, but their quality is highly dependent on the reference data used. Existing TMC datasets often leverage experimental structures, which biases methods trained on this data away from reactive configurations. Calculating properties of these TMCs also introduces challenges of spin and oxidation state assignment. Recent work on generating hypothetical TMCs with realistic connectivity and geometry has demonstrated promise to extend datasets beyond experimental structures, especially when combined with ML approaches to identify complexes with desirable properties. Experimental measurements would be ideal to train and/or test these models but are often scarce for TMCs, especially for those that are catalytically active. Thus, properties calculated with electronic structure theory are a popular alternative choice for training ML models. However, TMCs are challenging for many conventional electronic structure methods, and few benchmark datasets exist to assess which methods are most reliable and cost-effective. Many of the recommended methods are computationally demanding, leading to the use of neural network potentials as surrogate models for large-scale screening. By utilizing emerging tools for TMC structure generation and suitable electronic structure methods, increasingly high-quality datasets will be curated to enhance the predictive power of ML approaches to discover novel TMCs, including in the development of neural network potentials. By more accurately predicting TMC properties, promising and practical candidates for catalysis, photosensitizers, molecular devices, and medicine will be identified.

Type

Journal article

Publication

Curr. Opin. Chem. Eng., 50, 101189 (2025)

Exploring beyond experiment: generating high-quality datasets of transition metal complexes with quantum chemistry and machine learning

Abstract

Jacob W. Toney

Graduate Student

Aaron Garrison

Graduate Student

Weiliang Luo

Graduate Student

Roland St. Michel

Graduate Student

Heather J. Kulik

Professor of Chemical Engineering and Chemistry