Exploring Transition Metal Complexes with Large Language Models

Abstract

Here we use a large language model for the exploration of chemical space of transition metal complexes (TMCs), fine-tuning the open-source Llama-3.2-1B model with a variation of the SMILES string representation tailored for coordination chemistry. We identify several key molecular properties that are critical to produce valid TMCs which we use for prompt engineering, creating TMC-Llama models that generate unique, novel, and structurally validated TMCs at high success rates, and exceeding the performance of both standard benchmarks as well as state-of-the-art generative methods in the literature. We use the pydentate workflow to show that the TMC-Llama model generates new TMC chemistries that are physically realistic, synthetically plausible, and geometrically consistent with large datasets of known experimental TMCs. Our study lays the groundwork for future TMC-Llama generative model applications to discover new TMCs for functional materials for catalysis, photochemistry, medicine, and sustainability technologies.

Publication
submitted
Jacob W. Toney
Jacob W. Toney
Graduate Student
Roland St. Michel
Roland St. Michel
Graduate Student
Heather J. Kulik
Heather J. Kulik
Professor of Chemical Engineering and Chemistry