Machine Learning Could Make Manufacturing New Medicines Easier for Synthetic Chemists

Researchers are combining the power of AI with chemistry.

Published Feb 20, 2018 1:18 PM EST

Image: usehung/flickr

Predicting Reactions

Mixing chemicals in a laboratory isn’t as easy as it looks on TV. Researchers can’t just pour everything in one beaker and hope for the best. Developing a new chemical compound — like, say, a new drug or medicine — with the highest possible yield requires an optimum combination of chemicals.

While finding that ideal mixture is not an easy task, an artificial intelligence (AI) designed by researchers from Princeton University and Merck Research Laboratories might be able to help. The team’s machine learning algorithm can accurately predict chemical reaction yields, according to the study published in the journal Science.

“Many of these machine learning algorithms have been around for quite some time,” Princeton researcher Jesús Estrada said in a press release. “However, within the synthetic organic chemistry community, we really haven’t tapped into the exciting opportunities that machine learning offers.”

Chemists typically change one variable at a time when analyzing the outcome of different reactions. So the team’s algorithm — which can predict outcomes after modifying four different components at the same time — represents a large leap forward.

Machine Learning Improvements

One of the biggest obstacles for multi-dimensional models, the kind that the team’s new AI uses, is calculating a “descriptor” for each individual chemical. A descriptor is an input value that represents information about each chemical — how many bonds it has, its molecular weight, what it looks like geometrically, to name a few — and calculating each descriptor is very time consuming.

The researchers knew that calculating each descriptor one by one would have been impractical for the large number of chemical combinations they wanted to use. So they used a code, based on an existing program called Spartan, to calculate and extract descriptors for each chemical instead.

With the descriptors on hand, the researchers then tested them with various models. They settled for a machine learning program that uses what’s called a “random forest” model, which worked by using random samples from a small data set to build a decision tree. By averaging the yield results for a given reaction that each tree predicts, the model comes up with an overall yield prediction.

Furthermore, the random forests accurately predicted reaction yields from the results of only a hundred reactions, researcher Derek Ahneman explained. It could even predict yields for chemical compounds not originally included in the training set.

In short, the machine learning algorithm was able to handle data that ordinary chemists couldn’t.

The researchers hope that their algorithm could simplify the process of making synthetic compounds, particularly in developing new medicines. In order to help other chemists, they have made the software available for other labs to use. “The software that we developed can work for any reaction, any substrate,” Abigail Doyle, the A. Barton Hepburn Professor of Chemistry at Princeton University, said in a statement.

According to Doyle, vast resources and time are expended to make synthetic molecules, often in a largely ad hoc manner. This AI could change that. Using this new software, chemists could identify high-yielding combinations of chemicals and substrates more cheaply and efficiently than ever before.