Research

Vision Overview: An AI to aid in synthesis planning of arbitrary chemical species by both short- and long-term computational approaches.

Our long-term vision is to produce an AI tool for synthetic chemistry: a piece of software designed to help the applications chemist find a practical pathway to difficult but compelling molecular species.

Ideally, this would start from the user taking a picture of a line drawing of octaazacubane on their smart phone. It is then a relatively trivial exercise to convert the image to a SMILES string or similar to extract the chemical structure from the drawing. A rather more-detailed series of transformations can then encode the idea of the molecule into a “latent space” of chemical structures – several of us (Aspuru-Guzik and Lopez) have recently made substantial progress with this. At this point, the AI utility would do exactly as most of us do on a daily basis: give a quick but preliminary answer with the best data available. This would be easier for the AI than even the most learned synthetic chemist, as the full known reaction graph would be encoded in the AI, and efficient algorithms would be available for optimal synthetic route determination. Such a short-term query might be completed within one minute on a remote computational cluster, and the result returned immediately to the user. Following this, the AI would engage in a much longer period of continual reorganization and learning – the query would be entered into the AI’s database, and an extensive literature search and quantum chemical “nanoreactor” study (as pioneered by Martínez) of the local region of the query would be undertaken to provide an expanded reaction graph with improved fidelity in the region surrounding the query system.

The results from such a study would take longer than the initial “quick” response, and the AI would follow up with the user automatically to suggest refinements to the reaction approach. The results of this study would significantly enhance the quality of the “quick” answer for the next nearby query to the system. Beyond the obvious practical uses for day-to-day chemistry, there exist many intriguing possibilities for such a system. As one example, it is plausible to imagine that many repeated and clustered queries to the system would identify key “missing” transformations that the AI cannot locate to connect whole groups of chemical species. The AI will commence searches for reactions in this space of useful but unknown transformations using nanoreactor studies. It will also provide periodic reports of these identified gaps to experimentalists such as our team members Kanan, Burns, and Bradforth. This will allow for a “dialog” between the AI and expert chemists, where the chemists can guide the AI to search for transformations of special interest and where the AI can guide the chemists to think about especially useful classes of transformations that might be discovered.

Our long-term strategy to transform the face of synthetic chemistry is rather ambitious for even a five-year MURI project. Even further extensions are easily envisioned where the AI directs experiments (and analyzes their results) to accelerate its learning rate. Such extensions will require coupling to automated synthesis (e.g. with robots). Although these are outside the purview of the present scope, we fully intend that the work sets the stage for such a project. We will accomplish the most important stepping-stones toward the goal of an AI synthesis assistant, encompassing both thermal and photo-activated chemistry. We intend to do this in a way that combines computational and experimental discovery of new reactions with deep-learning methods to organize the discovered and/or previously known reactions and to predict synthetic pathways. To that end, our collaborative projects include the following objectives:

Task 1: Automated Reaction Discovery in Exotic Mechanisms/Environments

Contemporary quantum chemistry can easily validate the existence/properties of a proposed chemical system, but finds it much harder to elucidate reactive pathways between chemical nodes. This task will develop and deploy “nanoreactor”-type tools (as developed by the Martínez Group) to perform automated computational discovery of reactions, with particular emphasis on enumerating new and important reactions involving exotic environments (e.g., the challenging molten salt chemistry currently being explored by the Kanan Group) and direct photochemical transformations involving excited states and non-adiabatic dynamics (e.g., the photochemical synthesis of strained ring systems of the Burns Group). The advanced reaction path finding algorithms of the Lopez group will provide important high-level benchmarks of the developed methodology.

Example CO2 fixation reaction observed in pilot nanoreactor simulations of the Kanan molten salt environment.

Rich athermal chemistry realized in the photodynamics of sulfine, as predicted by non-adiabatic ab initio molecular dynamics run in TeraChem.

Task 2: Machine Learning Approaches to Synthesis Planning

While the first task will establish a better roadmap for chemical space, navigating on it will remain a challenge. This task will develop novel machine learning algorithms (extending work by the Aspuru-Guzik Group, and with support from the Martínez and Lopez Groups) to provide efficient traversal of the reaction graph. This will enable both pragmatic queries related to optimal synthesis planning (e.g., paths that maximize yield or minimize cost) and deeper queries related to missing reaction graph links that might spur additional theoretical and experimental studies seeking new types of chemical transformations. One intriguing possibility to be pursued in this task is a direct and automated connection between the ML representation of the reaction graph and the nanoreactor toolkit of Task 1; even when no human queries are being performed, the ML will be working to expand the known reaction graph by directing new nanoreactor simulations.

Schematic diagram for a molecular autoencoder, a deep learning algorithms for predicting molecular properties and finding new molecules with targeted properties.

Schematic of autoencoder strategy for synthetic planning.

Iterative feedback cycle between synthesis path finding and nanoreactor simulation.

Architecture of Chempix: a smartphone app that recognises mathematical formulae in images and translates them into symbols

Task 3: Experimental Validation and Direct Discovery of Exotic Reactions

A key component of our MURI team is a strong partnership between theory and experiment. The experimental members of our team are pursuing promising new approaches to chemical reactivity. They will directly seek new types of reactions with their firsthand developments in molten salt chemistry (Kanan) and photochemistry (Bradforth and Burns), notably including photochemical synthesis of new strained ring systems that may be useful in high energy-density applications. Beyond this, the exotic new environments and mechanisms explored by these team members will provide a key stress test of the theoretical framework. Additionally, the use of ultrafast experiments (Bradforth) will be invaluable in validating the fine details of the reaction mechanisms predicted by the theoretical methods. Finally, high-throughput screening techniques (Bradforth) will enable an “experimental nanoreactor,” wherein thousands of potential reactions are hypothesized and tested, directly expanding the known reaction graph of Task 2.