If you walk into a molecular biology lab and open the freezer, you’ll find it full of tiny tubes crowned with colorful screw-on caps and labeled with names like DNA polymerase, DNA ligase, protease, reverse transcriptase, luciferase, and restriction endonuclease. . Revealing their shared “-ase” suffix, these faithful friends of molecular biologists are ubiquitous enzymes, proteins that catalyze biological reactions. They help synthesize DNA, create plasmids, break down proteins and measure the activity levels of gene-activating promoters as researchers try to unlock the secrets of life. Many of those mysteries involve the activity of enzymes that catalyze essential, life-sustaining chemical reactions in living cells.
Beyond the crowded cellular cytoplasm and messy laboratory freezers, you can also find enzymes at work. in the industry. They’re key to the production of our favorite commercial food products, crisp white paper products, stain-smiting detergents, and pharmaceuticals: if you’ve got one. mRNA vaccine To protect you from COVID-19, you can thank RNA polymerase for making a pool of loose ribonucleotides a powerful inhibitor. Motivated by these successes, as well as environmental and practical concerns, many researchers are now looking to harness the power of machine learning to expand the role of enzymes in industry.
Chemically speaking, enzymes are catalysts: substances that speed up a reaction without being consumed themselves. Catalysis offers many advantages in manufacturing processes, including faster reaction speed, reduction of waste products, fewer reaction steps, and recyclability. As the climate warms and man-made wastes spread across the planet, there is interest in promoting green chemistry. Use of catalysts has increased
Among the catalysts, there are enzymes Especially attractive since they are generally easy to obtain (proteins can be synthesized and purified by fast-growing microbes such as E. coli); operate under mild, non-toxic conditions, usually in an aqueous environment at body temperature; are highly efficient; and offer stereo- and regioselectivity. Advances in the development of sitagliptin, a widely used type 2 diabetes drug, illustrate the benefits of enzyme catalysis in chemical production: A A transaminase derived by directed evolution An expensive high-pressure hydrogenation step, inorganic rhodium and iron catalysts, and a chiral purification step to remove unwanted stereoisomers provided improved yields and reduced waste.
However, there is often no obvious choice to catalyze a given reaction, especially if it does not normally occur in an organism. In the example of sitagliptin, initial testing of known transaminases yielded none with the desired activity. Additional tests were performed on a truncated version of the substrate to uncover a candidate with even lower activity, and then an extensive directed evolution process to compound this activity until the final transaminase was obtained. Even if one is willing to make such an attempt, deciding where to start requires a thorough knowledge of all existing enzymes, their substrates, and their stereo- and regioselective properties.
Enter machine learning. Becoming a natural choice for quickly handling any data set too complex and vast for the human brain to fully analyze, machine learning – a form of artificial intelligence – trains a computational framework exposed to data relevant to a task of interest. It can learn the patterns necessary to process new material correctly. For example, a model whose goal is to recognize cats in images will be trained on many pictures of cats. During training, the model will identify consistent features in those images which will then try to determine if it is a cat in new test images. The model acts as a black box, meaning it does not clearly explain the learned criteria; A caveat is that this can hide bias and lead to inaccurate results.
inside A recent paper Published in Nature Communications, a team of researchers from IBM Research – which boasts a strong AI Research Profile and hosts for free, A cloud-based AI tool For digital chemistry projects – present a machine-learning tool that can help chemists identify suitable enzymes for industrial applications. Their model incorporated knowledge of enzyme-substrate interactions from an extensive training set called ECREACT, which was created by combining four pre-existing databases. To boost the chemical skill of their model, the authors also added one million organic reactions contained in the US Patent Office database to the training regimen. Although these reactions are different from enzyme-catalyzed reactions, their addition helped the model better understand how molecules interact in enzyme-catalyzed and non-catalyzed reactions. They incorporated both forward and backward prediction capabilities into their model to enable prediction of the products of the corresponding enzymes between given enzymes and substrates (forward) and substrates and a given product (backward).
Their final model can make forward and backward predictions with good accuracy, indicating that they have successfully learned the specific characteristics of the substrates and products of each enzyme class. However, as with any machine-learning model, the results are biased by the quality and quantity of training data; Enzyme classes with sparse input data suffered from subpar prediction accuracy in both directions. The authors also noted that their training data consisted primarily of biosynthetic reactions with natural products and substrates, resulting in a bias that may hinder those seeking synthetic routes to unnatural products or using unnatural substrates. Despite these limitations, the model – Openly available For others to use as is or with additional training (for example, on proprietary data sets) — Green represents an important step in the effort to use enzymes for chemistry.