The Emergence of AI in Biochemistry
Abstract: Artificial intelligence (AI) 's emergence has spearheaded biochemical research due to its affinity for rapid experimentation via its predictive learning algorithms using machine learning (ML) techniques. Indeed, previous studies have shown the power of AI-assisted modeling and simulation towards predicting and recognizing potential gene mishaps, changing and modifying the human genome, and accelerating drug discovery; these are achieved through using AI to identify drug targets and predict the interactions between drugs and the human body. Here, we examine several prevailing applications of AI in genetics and drug development, concentrating on AI’s involvement in epigenetics, along with an AI software called AlphaFold, developed by DeepMind. This review of the current literature surrounding the power of AI provides a pathway toward further developments in disease recognition and drug invention.
Abstract
Artificial intelligence (AI) 's emergence has spearheaded biochemical research due to its affinity for rapid experimentation via its predictive learning algorithms using machine learning (ML) techniques. Indeed, previous studies have shown the power of AI-assisted modeling and simulation towards predicting and recognizing potential gene mishaps, changing and modifying the human genome, and accelerating drug discovery; these are achieved through using AI to identify drug targets and predict the interactions between drugs and the human body. Here, we examine several prevailing applications of AI in genetics and drug development, concentrating on AI’s involvement in epigenetics, along with an AI software called AlphaFold, developed by DeepMind. This review of the current literature surrounding the power of AI provides a pathway toward further developments in disease recognition and drug invention.
AI in Protein Folding
AlphaFold is the first AI system using deep learning that accurately predicts the 3D coordinates of all heavy atoms for a given protein using the primary amino acid sequence and aligned sequences of homologs as inputs. The network comprises two main stages.
Stage 1:
(1a): The trunk of the network processes the inputs through repeated layers (stages in the learning process) of a novel neural network block (the AI is taught to process data inspired by the human brain through meticulous trial and error) called Evoformer.
(1b): Produce a Nseq × Nres array (Nseq, number of sequences; Nres, number of residues (specific monomers within the polymeric chain of a polysaccharide, protein or nucleic acid) that represents a processed MSA (multiple sequence alignment, the alignment of three or more biological sequences of similar length for comparison) and a Nres × Nres array that represents residue pairs (pairs of monomers).
(1c): MSA representation is initialized with the raw MSA, and the Evoformer blocks contain several attention-based and non-attention-based components. We show evidence in interpreting the neural network that a concrete structural hypothesis arises early within the Evoformer blocks and is continuously refined. The key innovations in the Evoformer block are new mechanisms to exchange information within the MSA and pair representations that enable direct reasoning about spatial and evolutionary relationships.
Stage 2:
This stage introduces an explicit 3D structure for each protein residue, allowing the refinement of a highly accurate protein structure with precise details. The development of AI natural language processing could perhaps lead to further insights into DNA, specifically nucleic acid and amino acid sequences, and what the sequences reveal about a protein’s structure and function. To some extent, AI’s natural language processing abilities could be used to “decipher” the language of DNA. At the same time, all the Adensoines, Tyminess, Cytosiness, and Guaniness may look alien to us; unsupervised machine learning could allow us to decode DNA (both coding and non-coding) and the properties of the resulting proteins. Finally, it could be used to construct artificial genomes much faster than currently thought possible, which can be used in disease and disorder treatment.
AI In Genetics
The University of Washington School of Medicine in Seattle conducted a research experiment in 2022 highlighting a protein complex known as the Polycomb Repressive Complex 2 (PRC2). By attaching methyl groups on histones, specifically the methylation of histone H3 lysine 27 (H3K27), the PRC2 can turn off genes, creating a modified histone mark called H3K27me3. Although genes can be expressed by blocking the PRC2 protein with chemicals, the changes are usually imprecise, and the genome can be more likely damaged. Thus, the UW researchers developed an artificially computed protein (EB) that binds to the EZH2 binding site on the EED (all are subunits of PRC2) of the same PRC2 with a higher affinity. Furthermore, they included an EB negative control (NC) that abolishes EED binding using amino acid mutations F47E and I54E. Hence, this process reduces EZH2 and JARID2 (jumonji, AT rich interactive domain 2, a subunit of PRC2) levels, allowing the reduction of H3K27me3 repressive marks throughout the promoter regions. However, caution is necessary. When EZH2 is phosphorylated(A process where a phosphate group is added to a substance), ZMYND8 (Zinc Finger MYND-Type Containing 8) preferentially binds to it. This potentially will disturb the canonical role of EZH2, making it independent of the PRC2 and obtaining other oncogenic functions that promote cancer progression. In addition, it could also change the original structure of the PRC2 gene itself, causing changes in gene expression that contribute to cancer development. For this reason, UW researchers fused the AI-designed protein with a disabled version of Cas9 to prevent any unwanted risks in the genome. Cas9 is a protein in the gene-editing tool CRISPR and works by using RNA as an address tag. Disabling Cas9 will ensure that the genome sequence is unaltered. However, the function of directing the protein to a specific location remains alive so that researchers can utilize this tool. Result: In this experiment, UW researchers could transform IPSCs (induced pluripotent stem cells) into placental progenitor cells by awakening two genes. This is crucial in biomedical studies, as these stem cells can assist scientists in studying human development and allow an unlimited source of any type of human cell for therapeutic intentions. Ultimately, their technique also successfully found the specific location of PRC2-controlled regulatory regions known as TATA Boxes (or Goldberg–Hogness boxes), specifically the gene TBX18; these promoter regions are where the individual genes are activated. In addition, the researchers used ChIP-qPCR (ChIP-Quantitative Polymerase Chain Reaction) to dissect the mechanism of EBdCas9 action. Over three days, they used this tool to confirm that by directly blocking EZH2 from binding to EED, only EBdCas9 results in a reduction of EZH2, which causes a reduction of H3K27me3 marks.
AI’s Revolution
AI has the possibility to revolutionize disease recognition and quickly find the genetic sources of many diseases. New AI softwares, similar to AlphaFold, could recreate how a particular protein is made and folded, which would help scientists solve many unanswered questions that would have taken us decades of research to answer. When scientists needed to create a protein to prevent the silencing of the PRC2 genes, they did not realize how the EB protein could create further complications with other proteins. AlphaFold could predict the outcome of producing the protein by analyzing its molecular structure and determining what molecules and segments inside the protein could potentially be problematic. Moreover, it would be able to figure out the best possible solution instantly, saving time on the typical hurdles in research. When we utilize both functions, possible medicines and discoveries can be made. AI’s involvement in biomedical fields has already been noticeable, but as AI develops, AI will be able to recognize and prevent diseases at a much more rapid pace, saving lives.
References
Deevy O, Bracken AP. PRC2 functions in development and congenital disorders. Development. 2019 Oct 1;146(19):dev181354. doi: 10.1242/dev.181354. PMID: 31575610; PMCID: PMC6803372.
Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2
Levy S, Somasundaram L, Raj IX, Ic-Mex D, Phal A, Schmidt S, Ng WI, Mar D, Decarreau J, Moss N, Alghadeer A, Honkanen H, Sarthy J, Vitanza A, Hawkins RD, Mathieu J, Wang Y, Baker D, Bomsztyk K,
Ruohola-Baker H. dCas9 fusion to computer-designed PRC2 inhibitor reveals functional TATA box in distal promoter region. Cell Rep. 2022 Mar 1;38(9):110457. doi: 10.1016/j.celrep.2022.110457.
PMID: 35235780; PMCID: PMC8984963.
Skolnick J, Gao M, Zhou H, Singh S. AlphaFold 2: Why It Works and Its Implications for Understanding the Relationships of Protein Sequence, Structure, and Function. J Chem Inf Model. 2021 Oct
25;61(10):4827-4831. doi: 10.1021/acs.jcim.1c01114. Epub 2021 Sep 29. PMID: 34586808; PMCID: PMC8592092.
Shin J, Jiang F, Liu JJ, Bray NL, Rauch BJ, Baik SH, Nogales E, Bondy-Denomy J, Corn JE, Doudna JA. Disabling Cas9 by an anti-CRISPR DNA mimic. Sci Adv. 2017 Jul 12;3(7):e1701620. doi:
10.1126/sciadv.1701620. PMID: 28706995; PMCID: PMC5507636.
Tang B, Sun R, Wang D, Sheng H, Wei T, Wang L, Zhang J, Ho TH, Yang L, Wei Q, Huang H. ZMYND8 preferentially binds phosphorylated EZH2 to promote a PRC2-dependent to -independent function
switch in hypoxia-inducible factor-activated cancer. Proc Natl Acad Sci U S A. 2021 Feb 23;118(8):e2019052118. doi: 10.1073/pnas.2019052118. PMID: 33593912; PMCID: PMC7923384.