Large Language Models for Cu-Catalyzed CO₂ Hydrogenation

DFT offers mechanistic insight but remains restricted to idealized conditions, leaving catalyst discovery largely empirical. This limitation is acute for Cu-catalyzed CO2-to-methanol, where dynamic states and inconsistent literature hinder comparison. We will present benchmarked LLM workflow that structures validated, standardized datasets.

Catalyst discovery still relies largely on iterative trial and error, while DFT provides valuable mechanistic insight but is typically limited to idealized surfaces and simplified reaction environments. For Cu-catalyzed CO2-to-methanol, this gap is especially important because catalytic performance depends on dynamic Cu and Cu oxide interfacial states, promoter effects, and synthesis and activation history under realistic operating conditions. While the literature contains a rich but fragmented record of catalyst formulations, testing conditions, and performance, often reported with inconsistent context and metrics. The integration of computational chemistry and data science is accelerating catalyst discovery, but standardized, cross-comparable datasets are lacking. We present a Large Language Model (LLM)-enabled literature mining workflow that extracts, structures, and validates catalytic performance data from the literature. This approach generates standardized catalytic profiles and an evidence-linked, validated dataset, enabling reliable cross-study comparison and data-driven hypothesis generation. Developed in collaboration with experimental partners at the Dutch ARC-CBBC consortium, the workflow is continuously benchmarked against experiments.

Partners