Presentation Abstracts
Invited Speaker
Author: Emma King-Smith
Title: Looking to AI for the Future of Computational Chemistry
Abstract:
Computational chemistry has profound application in numerous disciplines of organic chemistry including drug design, materials discovery, and synthesis. However, within the field of synthetic organic chemistry, computation is viewed as a retrospective tool rather than a predictive one. This talk explores how the rise of machine learning and AI can change that, taking our first steps towards computationally guided organic synthesis. We investigate how small datasets can be used in conjunction with transfer learning to overcome traditional limitations for data-driven chemistry, developing models with state-of-the-art chemical reaction yield and regiochemical prediction accuracy.
Contributed Talks
Machine Learning for the Excited States of Radicals
Speaker: James Green
Additional Authors:
Institution: University of Oxford
Abstract:
Recent years have shown an explosion of interest in organic radicals due to their potential for use in OLEDs and qubits.1,2 However, accurately computing their electronic structure is very challenging, due to their open-shell nature. Here we present a new computational method – ExROPPP, which accurately predicts spin-pure excited states of hydrocarbon radicals at a fraction of the computational cost of high level methods and using the same parameters as previously used for closed-shell molecules.3 Furthermore, we demonstrate the use of supervised machine learning to further improve upon this method. We compiled a database of spectroscopic data of radicals and trained our model on these data. We find that machine learning improves the overall accuracy of our method, with a marked improvement for heterocyclic radicals. This paves the way for high throughput computational screening of organic radicals, accelerating the discovery of new materials for OLEDs and molecular qubits.
References:
1. Ai, et al., Nature 563, 536–540, 2018.
2. Gorgon et al., Nature, 620, 538–544, 2023.
3. J. D. Green and T. J. H. Hele, J. Chem. Phys. 160, 164110, 2024.
Machine Learning excited State Potential Energy Surfaces of Solvated Nile Red with ESTEEM
Speaker: Jacob Eller
Additional Authors: Prof. Nicholas Hine
Institution: University of Warwick
Abstract:
Machine Learned Interatomic Potentials (MLIPs) offer a powerful combination of abilities for accelerating theoretical spectroscopy calculations utilising both ensemble sampling and trajectory post-processing for inclusion of vibronic effects, which can be very challenging for traditional ab initio MD approaches. We demonstrate a workflow that enables efficient generation of MLIPs for the solvatochromic dye nile red system, in a variety of solvents. We use iterative active learning techniques to make this process as efficient as possible in terms of number and size of DFT calculations. Additionally, we compare the efficacy of two methodologies: generating distinct MLIPs for each adiabatic state, and using one ground state MLIP in combination with delta-ML of excitation energies. To evaluate the validity of the resulting models, we compare predicted absorption and emission spectra to experimental spectra.
Incorporating Explicit Solvent into Reaction Modelling
Speaker: Veronika Juraskova
Additional Authors: Hanwen Zhang, Fernanda Duarte
Institution: University of Oxford
Abstract:
Dynamics and solvation effects play a fundamental role in modelling chemical processes in the liquid phase. They influence the structure and stability of all species participating in the process, including the intermediates and transition states, thus dictating the reaction rates, selectivity, and even the mechanism. Yet, the accurate computational modelling of these effects remains challenging, particularly in protic solvents, requiring an explicit description of solute-solvent interactions
Here, I will discuss the computational strategies for generating reactive MLPs to model chemical processes in solutions. Our approach builds on automated active learning to create data-efficient training sets, reproducing the accurate DFT energies and forces. Leveraging the Atomic Cluster Expansion framework, combined with linear regression or message-passing neural networks (MACE), we demonstrate how MLPs accelerate the molecular dynamics of reactions in solution, paving the way for modelling chemical processes at experimental conditions.
Machine Learned Potentials for Modelling the Shape of Flexible Organic Molecules
Speaker: Richard Bryce
Additional Authors: Christopher D. Williams, Jas Kalayan, Neil A. Burton
Institution: University of Manchester
Abstract:
While developments in algorithm and hardware are enabling longer time scale molecular dynamics simulations, the accuracy of these simulations can be limited by the quality of the underlying force field potential. Machine learning (ML) potentials show considerable promise in the accurate modelling of molecular conformation. Here we consider the ability of current ML and semi-empirical quantum chemical approaches to model the conformational behaviour of druglike molecules and monosaccharides. We also discuss developments in a neural network scheme called PairFENet and consider its performance for small organic molecules in the gas phase and solution. The importance of adequate conformer sampling in the reference training set will also be discussed.
Kinetic predictions for SN2 and E2 reactions using the BERT architecture: Comparison and interpretation
Speaker: Chloe Wilson
Additional Authors: Jason Crain, Fernanda Duarte
Institution: University of Oxford
Abstract:
Accurate prediction of reaction rates is an integral step in reaction mechanism elucidation and design of synthetic pathways. Machine learning (ML) facilitates fast, accurate prediction of experimental rate constants, overcoming limitations in traditional quantum chemistry (QM) methods. In our previous work, we demonstrated the efficacy of Bidirectional Encoder Representations from Transformer (BERT) models in predicting the experimental logk of SN2 reactions, validating BERT’s predictions against known reactivity rules. Our current work expands this framework to multi-class rate prediction, fine-tuning BERT to predict experimental logk values for E2 and SN2 reactions, and interpreting the predictions to assess BERT’s learning of structural and physical effects that drive E2/SN2 competition. Our E2/SN2 rate prediction BERT achieves an RMSE of 1.2 ± 0.1 logk on similarity-split test data, exceeding the accuracy of the current E2/SN2 rate prediction model from the literature. Furthermore, the accuracy and chemical validity of predictions made by the E2/SN2 BERT is maintained with respect to training on each individual mechanism. By validating predictions against established reactivity rules, we believe this work will increase chemist’s confidence in using machine-learned kinetics to guide synthetic design.
Automating Transition State Search in Metal Catalysed Reactions
Speaker: Shoubhik Raj Maiti
Additional Authors: Fernanda Duarte, David Buttar
Institution: University of Oxford
Abstract:
Finding transition states (TS) is a key step in elucidating the mechanisms underlying chemical reactions, facilitating the optimisation of synthetic procedures and the discovery of new catalysts. Traditional approaches to finding TSs employing DFT or similar methods have become routine. However, despite advances in the field, characterising TSs still requires significant human time and effort. Automation of TS search has the potential to address these challenges. Indeed, several advances have been made in this area using molecular graph-based methods,[1,2] or by systematic exploration of the potential energy surface (PES)[3]; however, most of them have focused on organic reactions, and struggle to describe transition metal (TM) catalysed reactions, especially transition states and intermediate states. This is due to the complex potential energy surfaces (PES) of these systems, which arises from their complex electronic structure and flexible coordination ability. This also makes constructing molecular graphs challenging, consequently making it difficult to automate TS search for TM-catalysed reactions. Given the relevance of these reactions in pharmaceutical and materials industry, it is clear that automated in silico elucidation of reaction paths and their kinetics holds promise for optimising existing catalysts and designing new ones.
In this study, we present our efforts to design an automated workflow for TS search and reaction path elucidation for TM- catalysed reactions, building on our software autodE[2]. We discuss our implementation of recently published double-ended TS search method i-EIP (improved Elastic Image Pair)[4] in autodE. We then compare its robustness and efficiency against popular double-ended methods, including NEB-TS (Nudged Elastic Band – Transition State),[5] DE-GSM (Double-Ended Growing String Method)[6] and DHS (Dewar-Healy-Stewart)[7] across a series of TM-catalysed reactions. The results indicate that popular methods may not always be the most efficient. Additionally, we introduce a fast method of generating molecular graphs for metal complexes from low-level tight-binding calculations, which can improve the reliability of graph-based representations of reactions as used in autodE. We aim for this study to contribute to the broader application of automated reaction path-finding methods, paving the way for faster development of more efficient and selective catalysts.
References:
[1] L. D. Jacobson, A. D. Bochevarov et al., J. Chem. Theory Comput. 2017, 13, 5780
[2] T. A. Young, J. J. Silcock, A. J. Sterling, F. Duarte, Angew. Chem. Int. Ed. 2021, 60, 4266
[3] S. Maeda, K. Morokuma, J. Chem. Phys. 2010, 132, 241102
[4] Y. Liu, H. Qi, M. Lei, J. Chem. Theory Comput. 2023, 19, 2410
[5] A. Asgeirsson, H. Jonsson, et al., J. Chem. Theory. Comput. 2021, 17, 4929-4945
[6] P. Zimmerman, J. Chem. Phys. 2013, 138, 184102
[7] M. J. S. Dewar, E. F. Healy, J. J. P. Stewart, J. Chem. Soc., Faraday Trans. 2 1984, 80, 227
ichor: A Python Library for Computational Chemistry Data Management and Machine Learning Force Field Development
Speaker: Yulian T. Manchev
Additional Authors: Matthew J. Burn, and Paul L. A. Popelier
Institution: University of Manchester
Abstract:
We present ichor, an open-source Python library that simplifies data management in computational chemistry and streamlines machine learning force field development. Ichor implements many easily extendable file management tools, in addition to a lazy file reading system, allowing efficient management of hundreds of thousands of computational chemistry files. Data from calculations can be readily stored into databases for easy sharing and post-processing. Raw data can be directly processed by ichor to create machine learning-ready datasets. In addition to powerful data-related capabilities, ichor provides interfaces to popular workload management software employed by High Performance Computing clusters, making for effortless submission of thousands of separate calculations with only a single line of Python code. Furthermore, a simple-to-use command line interface has been implemented through a series of menu systems to further increase accessibility and efficiency of common important ichor tasks. Finally, ichor implements general tools for visualization and analysis of datasets and tools for measuring machine-learning model quality both on test set data and in simulations. With the current functionalities, ichor can serve as an end-to-end data procurement, data management, and analysis solution for machine-learning force-field development.
Using GPT and other Deep Neural Network Models to Create Virtual Screening Libraries for Docking and DFT-based Protein-ligand Analysis
Speaker: Mauricio Cafiero
Additional Authors:
Institution: University of Reading
Abstract:
Several generative, pre-trained models were developed to create virtual screening libraries of molecules with affinity for the HMG Coenzyme A reductase (HMGCR) enzyme. These models were pre-trained on general drug molecule structures and then fine-tuned to create HMGCR inhibitors. The libraries were then screened by using a deep neural network trained on HMGCR inhibitors to predict IC50 values and by docking the molecules in the HMGCR binding suite. The IC50 values and docking scores had good correlation for most libraries, with a t-test showing 95% confidence in the correlation. The molecules produced by the models were then grouped into clusters using k-means analysis and classified by inspection. It was found that more fine-tuning/less pre-training lead to more potent inhibitors, but less stable models, while less fine-tuning/more pre-training still produced good inhibitors, but were more stable. Prompt/input length into the generative models was also found to have a large effect, with shorter prompts producing more robust libraries. The libraries contained ~42% of molecules that were statin-like according to k-means analysis, and the docking poses for the most potent of these molecules were selected for analysis by Density Functional Theory.
Machine Learning of Isomerization in Porous Molecular Frameworks: Exploring Functional Group Pair Distance Distributions
Speaker: Matt Addicoat
Additional Authors: Maryam Nurhuda, Cansu Dogan, Yusuf Hafidh, Carole C. Perry, Daniel Packwood
Institution: Nottingham Trent University
Abstract:
Molecular Framework Materials (MFMs), including Metal Organic Frameworks (MOFs), Covalent Organic Frameworks (COFs) and their discrete equivalents, Metal Organic Polyhedra (MOPs) and Porous Organic Cages (POCs) are porous materials, composed of molecular fragments, bound in one of many topologies.
The global structure of such MFMs is well defined by specification of their topology and building blocks, however, the local structure is not. Especially in the case where a linker has been functionalised, the resultant lack of symmetry leads to many possible isomers of the MFM.
In this contribution, we develop a fingerprint (descriptor) for functionalised molecular framework structures. We describe a periodic or discrete MFM by its pore shape and derive a fingerprint based on the occurrence of pairwise distances between functional groups in each pore. We enumerate the possibilities of functional group arrangements in the 14 most common pore shapes, created by ditopic (2-connected) linkers and present fingerprints for each mono-functionalised pore. We show how this descriptor accurately captures the pore environment in order to model adsorption processes.
Morphological Insights from Microscopy: A Transformer-Autoencoder for 2D-3D LNO Particle Reconstruction from Synthetic SEM Data
Speaker: Steven Tendryra
Additional Authors: Sabrina Sicolo, Marcel Sadowsk, Peter Spackman, Alvin J Walisinghe, Lars Matthes, Michael W Anderson
Institution: University of Manchester
Abstract:
Extracting meaningful information concerning particle morphology from 2D scanning electron micrographs remains a challenge in particle engineering. We present a novel transformer-autoencoder methodology for accurate 3D reconstruction of particles from synthetic 2D microscopy data.
Focusing on lithium nickel oxide (LNO), a cathode active material whose single-crystal form can obtained by molten salt synthesis (MSS) and exhibits several distinct morphologies [1], we simulate tens of thousands of permutations of particle morphology using the CrystalGrower [2, 3] package, a generic Monte Carlo code that can simulate the growth of any crystal.
By pairing 2D SEM-like renderings of simulated LNO particles with a coarse-grained 3D representation, we leverage the generative power of autoencoders and the context/sequence tracking abilities of transformers to accurately reconstruct any single crystal LNO particle from image-based data. Similar methodology has previously been used to classify/cluster crystal shapes from voxel data using aspect ratio/Zingg analysis [4], and we aim to eventually be able to generate accurate 3D representations of real LNO particles from SEM data.
This has powerful implications in the field of crystal engineering, as when coupled with appropriate digital tools such as CrystalGrower, such methodology provides a reliable bridge between experiment and simulation. A wealth of underlying information, such as shape/size distributions and underlying thermodynamics, can be extracted via existing simulation optimisation workflows. Crucially, this would allow for a greater understanding of crystallisation processes, greater tailoring of synthetic outcomes, and better overall particle design.
We are confident that such methodology is generalisable and can be applied to any crystal system exhibiting a polyhedral primary particle, provided the model can be trained on enough diverse particle shape data. This work also has wider implications in the field of computer vision and machine learning, in that it provides a new route to reconstructing 3D shapes from 2D representations.
References:
[1] Kim, M., Zou, L., Son, S.-B., Bloom, I. D., Wang, C., & Chen, G. (2022). Improving LiNiO2 cathode performance through particle design and optimization. In Journal of Materials Chemistry A (Vol. 10, Issue 24, pp. 12890–12899). Royal Society of Chemistry (RSC). https://doi.org/10.1039/d2ta02492f.
[2] Anderson, M. W.; Gebbie-Rayet, J. T.; Hill, A. R.; Farida, N.; Attfield, M. P.; Cubillas, P.; Blatov, V. A.; Proserpio, D. M.; Akporiaye, D.; Arstad, B.; et al. Predicting Crystal Growth via a Unified Kinetic Three-Dimensional Partition Model. Nature, 2017, 544, 456–459. https://doi.org/10.1038/nature21684.
[3] Hill, A. R.; Cubillas, P.; Gebbie-Rayet, J. T.; Trueman, M.; de Bruyn, N.; Harthi, Z. al; Pooley, R. J. S.; Attfield, M. P.; Blatov, V. A.; Proserpio, D. M.; et al. CrystalGrower: A Generic Computer Program for Monte Carlo Modelling of Crystal Growth. Chemical Science, 2021, 12, 1126–1146. https://doi.org/10.1039/d0sc05017b.
[4] Cha, J.; Basak, S.; Hill, A. R.; Gebbie-Rayet, J. T; Walisinghe, A. J.; Tendyra, S.; Anderson, M. W.; Thiyagalingam, J. (2024). Artificial Intelligence and Machine Learning to Crack Crystal Growth. [Manuscript in Preparation]