tag:theconversation.com,2011:/nz/topics/protein-structure-96898/articlesProtein structure – The Conversation2022-11-22T13:25:50Ztag:theconversation.com,2011:article/1937052022-11-22T13:25:50Z2022-11-22T13:25:50ZScientists uncovered the structure of the key protein for a future hepatitis C vaccine – here’s how they did it<figure><img src="https://images.theconversation.com/files/496217/original/file-20221118-14-r6a8me.jpg?ixlib=rb-1.1.0&rect=0%2C0%2C1999%2C1499&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Imaging the proteins on the surface of HCV has been challenging because of the virus's shape-shifting nature.</span> <span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/illustration/hepatitis-c-virus-particles-illustration-royalty-free-illustration/1042127452">Juan Gaertner/Science Photo Library via Getty Images</a></span></figcaption></figure><p>The <a href="https://www.cdc.gov/hepatitis/hcv/index.htm">hepatitis C virus, or HCV</a>, causes a chronic liver infection that can lead to permanent liver scarring and, in dire cases, cancer. It affects around <a href="https://doi.org/10.1007/s42399-020-00588-3">71 million people worldwide</a> and causes approximately 400,000 deaths each year. While <a href="https://www.uptodate.com/contents/direct-acting-antivirals-for-the-treatment-of-hepatitis-c-virus-infection">treatments are available</a> for HCV-related infections, they are expensive, hard to access and do not protect against reinfection. A vaccine that can help prevent HCV infection is a major unmet medical and public health need. </p>
<p>One major reason there hasn’t been an HCV vaccine yet is that scientists have yet to identify the proper antigen, or the part of the virus would trigger a protective immune response in the body.</p>
<p>Decades of research have pinpointed <a href="https://doi.org/10.1038/nrmicro3098">HCV E1E2</a>, the only protein on the surface of the virus, as the most promising vaccine candidate. However, developing an HCV vaccine based on that protein is limited by uncertainty around what it looks like. Knowing the structure of the protein is necessary to figure out how the immune system responds to the virus.</p>
<p>So how do researchers capture the structure of single protein on a shape-shifting virus? </p>
<p>We are researchers who specialize in <a href="https://scholar.google.com/citations?user=Xejfx54AAAAJ&hl=en">microscopy</a> and <a href="https://scholar.google.com/citations?user=iQj9rSwAAAAJ&hl=en">vaccine design</a>. With new technology, we were able to <a href="https://doi.org/10.1126/science.abn9884">visualize the molecular details</a> of this elusive protein, unlocking key insights into how this virus works and offering a potential blueprint for a future vaccine.</p>
<p>This is how we did it.</p>
<h2>Challenges of capturing a shape-shifting virus</h2>
<p>One reason it has been so difficult to capture the structure of the HCV E1E2 protein is that it is both <a href="https://doi.org/10.1016/j.celrep.2022.110859">flexible and fragile</a>. It changes its shape so often and is so easily broken that it’s challenging to purify. </p>
<p>As an analogy, imagine a bowl of spaghetti drenched in tomato sauce. Now imagine trying to take a picture of each individual piece of spaghetti in the same position over time while the bowl is shaking. Hard to do, right? That’s what it was like to image the full E1E2 protein.</p>
<p>There were also <a href="https://doi.org/10.1126/science.1251652">technological barriers</a>. Until recently, available imaging techniques were limited in their ability to view microscopic proteins. <a href="https://chem.libretexts.org/Bookshelves/Analytical_Chemistry/Supplemental_Modules_(Analytical_Chemistry)/Instrumentation_and_Analysis/Diffraction_Scattering_Techniques/X-ray_Crystallography">X-ray crystallography</a>, for instance, is unable to capture molecules that frequently change and shape-shift, like HCV. Moreover, other options, such as <a href="https://chem.libretexts.org/Bookshelves/Analytical_Chemistry/Physical_Methods_in_Chemistry_and_Nano_Science_(Barron)/04%3A_Chemical_Speciation/4.07%3A_NMR_Spectroscopy">nuclear magnetic resonance spectroscopy</a>, required cutting large parts of the protein or chemically manipulating it in a way that would transform its physiological state and potentially alter its function.</p>
<p>So to examine the structure of E1E2, we needed a way to extract and purify, stabilize and trap the entire shape-shifting protein into one configuration.</p>
<h2>How to take a picture of protein</h2>
<p><a href="https://doi.org/10.1038/d41586-020-01658-1">Cryo-EM, or cryo-electron microscopy</a>, is a type of imaging technique that views specimens at cryogenic temperatures, in this case the boiling point of nitrogen: minus 320.8 degrees Fahrenheit (minus 196 Celsius). With temperatures that cold, ice freezes so quickly that it doesn’t have time to crystallize. That creates a beautiful glasslike frame around the protein of interest, allowing an unhindered view of every structural detail. Cryo-EM also requires very little protein to work, reducing the amount of material we would need to purify. </p>
<p>Winner of the <a href="https://www.nobelprize.org/prizes/chemistry/2017/press-release/">2017 Nobel Prize in chemistry</a> and <a href="https://doi.org/10.1038/nmeth.3730">Nature magazine’s 2015 “Method of the Year</a>” award, cryo-EM is superb for imaging biological macromolecules in their native, or natural, state in the aqueous environment of human blood. Cryo-EM was also pivotal for characterizing the <a href="https://doi.org/10.1038/nature17200">structure of the COVID-19 virus</a> and its variants.</p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/Qq8DO-4BnIY?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
<figcaption><span class="caption">Cryo-EM has allowed researchers to see complex proteins they weren’t able to before.</span></figcaption>
</figure>
<p>So how do you take a picture of a protein? </p>
<p>First, we embedded the genetic code to make E1E2 in human cells in a petri dish so we would have sufficient amounts of protein to study. After purifying the protein, we <a href="https://caic.bio.cam.ac.uk/electron-microscopy/SpecimenPrep/PlungeFreezing">plunged it into liquid ethane</a> followed by liquid nitrogen. Liquid ethane is used to freeze the protein because it has a higher boiling point than liquid nitrogen. This means it is able to capture more heat before turning to a gas, allowing the protein to freeze much more quickly than it would in liquid nitrogen and avoid structural damage. </p>
<p>Once the protein was vitrified, or in a glasslike ice state, we were able not just to see its overall structure, but also to capture multiple individual configurations of the protein that it takes when it shape-shifts, including its less stable forms.</p>
<p>At this point, our protein was ready for its close-up. We employed a microscope that <a href="https://www.ccber.ucsb.edu/ucsb-natural-history-collections-botanical-plant-anatomy/transmission-electron-microscope">uses a beam of focused, high energy electrons</a> and a very fancy camera that detects how the elections bounce off the protein’s surface. This created a 2D image that we then mathematically transformed into a 3D model. And that was how we got the coveted “close-up” of HCV’s surface protein. </p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/jgEQ6A2-liU?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
<figcaption><span class="caption">This video shows the newly identified 3D structure of the E1E2 protein on the surface of the hepatitis C virus. The two main subunits of the protein are colored in pink and blue. Sugar molecules are colored in green.</span></figcaption>
</figure>
<p>Our next step was then to assess the location of each amino acid, or building block of the protein, in 3D space. Because every amino acid has a unique shape, we used a computer program that could identify each one in our 3D map. This allowed us to manually reconstruct a high-resolution model of the protein, one building block at a time.</p>
<h2>A new tool to design an HCV vaccine</h2>
<p>Our 3D map and model of the HCV E1E2 protein supports previous research describing its structure while providing new insights into features that will help pave the way for a long-sought vaccine design against this virus. </p>
<p>For example, our structure reveals that the interface between the two main parts of the protein is stabilized by sugars and hydrophobic patches, or areas that push out water molecules. This creates sticky binding hubs along the protein and keeps it from falling apart – a potential site for protective antibodies and new drugs to target. </p>
<p>Researchers now have the tools to design antiviral drugs and vaccines against HCV infection.</p><img src="https://counter.theconversation.com/content/193705/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Lisa Eshun-Wilson receives funding from the National Science Foundation. </span></em></p><p class="fine-print"><em><span>Alba Torrents de la Peña receives funding from Netherlands Organization for Scientific Research (NWO) Rubicon Grant 45219118. </span></em></p>Using a Nobel Prize-winning technique called cryo-EM, researchers were able to identify potential areas on the hepatitis C virus that a vaccine could target.Lisa Eshun-Wilson, Postdoctoral Scholar in Molecular and Cell Biology, The Scripps Research InstituteAlba Torrents de la Peña, Postdoctoral Fellow in Integrative Structural and Computational Biology, The Scripps Research InstituteLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1732092022-01-05T13:47:58Z2022-01-05T13:47:58ZWhen researchers don’t have the proteins they need, they can get AI to ‘hallucinate’ new structures<figure><img src="https://images.theconversation.com/files/438742/original/file-20211221-129369-1f6d9kk.jpg?ixlib=rb-1.1.0&rect=600%2C104%2C869%2C650&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">De novo protein design with deep learning can open new doors for medicine and many other fields.</span> <span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/illustration/human-ace2-receptor-molecule-royalty-free-illustration/1227506538"> Kateryna Kon/Science Photo Library via Getty Images</a></span></figcaption></figure><p>All living organisms use proteins, which encompass a vast number of complex molecules. They perform a wide array of functions, from allowing plants to <a href="https://www.energy.gov/science/articles/photosynthesis-gathering-sunshine-world-s-smallest-antennas">use solar energy for oxygen production</a> to helping your <a href="https://www.livescience.com/antibodies.html">immune system</a> fight against pathogens to letting your <a href="https://www.britannica.com/science/protein/The-muscle-proteins">muscles</a> perform physical work. <a href="https://doi.org/10.1007/978-1-61779-921-1_1">Many drugs</a> are also based on proteins.</p>
<p>For many areas of biomedical research and drug development, however, there are no natural proteins that can serve as suitable starting points to build new proteins. Researchers designing new drugs to <a href="https://www.doi.org/10.1126/science.abd9909">prevent COVID-19 infection</a>, or developing proteins that can <a href="https://doi.org/10.1038/s41586-019-1432-8">turn genes on or off</a> or <a href="https://www.doi.org/10.1126/science.aay2790">turn cells into computers</a>, had to create new proteins from scratch.</p>
<p>This process of <a href="https://doi.org/10.1038/nature19946">de novo protein design</a> can be difficult to get right. <a href="https://scholar.google.com/citations?user=Hp8zwAgAAAAJ&hl=ru">Protein engineers like me</a> have been trying to figure out ways to more efficiently and accurately design new proteins with the properties we need.</p>
<p>Luckily, a form of artificial intelligence called <a href="https://www.techtarget.com/searchenterpriseai/definition/deep-learning-deep-neural-network">deep learning</a> may provide an elegant way to create proteins that did not exist previously – <a href="https://doi.org/10.1038/s41586-021-04184-w">hallucination</a>.</p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/PJLT0cAPNfs?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
<figcaption><span class="caption">New proteins created from scratch can be deployed to tackle a wide range of environmental and medical challenges.</span></figcaption>
</figure>
<h2>Designing proteins from scratch</h2>
<p>Proteins are made up of hundreds to thousands of smaller building blocks called <a href="https://www.britannica.com/science/amino-acid">amino acids</a>. These amino acids are connected to one another in long chains that fold up to form a <a href="https://www.britannica.com/science/protein">protein</a>. The order in which these amino acids are connected to one another determines each protein’s unique structure and function.</p>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Illustration of the four levels of protein structure." src="https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=1046&fit=crop&dpr=1 600w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=1046&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=1046&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=1315&fit=crop&dpr=1 754w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=1315&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=1315&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Proteins are composed of amino acid chains that fold into a protein.</span>
<span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:Main_protein_structure_levels_en.svg">LadyofHats/Wikimedia Commons</a></span>
</figcaption>
</figure>
<p>The biggest challenge protein engineers face when designing new proteins is coming up with a protein structure that will perform a desired function. To get around this problem, researchers typically create design templates based on naturally occurring proteins with a similar function. These templates have instructions on how to create the unique folds of each particular protein. However, because a template must be created for each individual fold, this strategy is time-consuming, labor-intensive and limited by what proteins are available in nature.</p>
<p>Over the past few years, various research groups, <a href="https://doi.org/10.1073/pnas.1914677117">including</a> the <a href="https://www.bakerlab.org/">lab I work in</a>, have developed a number of dedicated <a href="https://towardsdatascience.com/a-laymans-guide-to-deep-neural-networks-ddcea24847fb">deep neural networks</a> – computer programs that use multiple processing layers to “learn” from input data to make predictions about a desired output.</p>
<p>When the desired output is a new protein, millions of parameters describing different facets of a protein are put into the network. What’s predicted is a randomly chosen sequence of amino acids mapped onto the most probable 3D structure that sequence would take.</p>
<p>Network predictions for a random amino acid sequence are blurry, meaning the final structure of the protein is not very clear-cut, while both naturally occurring proteins and proteins built from scratch produce much more well-defined protein structures.</p>
<h2>Hallucinating new proteins</h2>
<p>These observations hint at one way that new proteins can be generated from scratch – by tweaking random inputs to the network until predictions yield a well-defined structure.</p>
<p>The <a href="https://doi.org/10.1038/s41586-021-04184-w">protein generation method</a> <a href="https://www.ipd.uw.edu/">my colleagues</a> and I developed is conceptually similar to <a href="https://towardsdatascience.com/everything-you-ever-wanted-to-know-about-computer-vision-heres-a-look-why-it-s-so-awesome-e8a58dfb641e">computer vision</a> methods such as <a href="https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html">Google’s DeepDream</a>, which finds and enhances patterns in images. </p>
<p>These methods work by taking networks trained to recognize human faces or other patterns in images, like the shape of an animal or an object, and inverting them so that they learn to recognize these patterns where they don’t exist. In DeepDream, for example, the network is given arbitrary input images that are adjusted until the network can recognize a face or some other shape in the image. While the final image doesn’t look much like a face to a person looking at it, it would to the neural network.</p>
<p>The products of this technique are often referred to as <a href="https://www.americanscientist.org/article/computer-vision-and-computer-hallucinations">hallucinations</a>, and this is what we call our designed proteins, too.</p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/hnT-P3aALVE?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
<figcaption><span class="caption">Deep neural networks can also learn how to hallucinate images from words.</span></figcaption>
</figure>
<p><a href="https://doi.org/10.1038/s41586-021-04184-w">Our method</a> starts by passing a random amino acid sequence through a deep neural network. The resulting predictions are initially blurry, with unclear structures, as expected for random sequences. Next, we introduce a mutation that changes one amino acid in the chain into a different one and pass this new sequence through the network again. If this change gives the protein a more defined structure, then we keep the amino acid and we introduce another mutation into the sequence.</p>
<p>With each repetition of this process, the proteins get closer and closer to the real shape they would take if they were produced in nature. Thousands of repetitions are required to create a brand-new protein. </p>
<p>Using this process, we generated 2,000 new protein sequences predicted to fold into well-defined structures. Of these, we selected over 100 that were the most distinct in shape to physically recreate in the lab. Finally, we chose three of the top candidates for detailed analysis and confirmed that they were close matches to the shapes predicted by our hallucinated models.</p>
<h2>Why hallucinate new proteins?</h2>
<p>Our hallucination approach greatly simplifies the protein design pipeline. By eliminating the need for templates, researchers can directly focus on creating a protein based on desired functions and let the network take care of figuring out the structure for them.</p>
<p>Our work opens up multiple avenues for researchers to explore. Our lab is <a href="https://doi.org/10.1101/2021.11.10.468128">currently investigating</a> how to best use this hallucination approach to generate even more specificity in the function of designed proteins. Our approach can also be readily extended to design new proteins using <a href="https://www.ipd.uw.edu/2021/07/rosettafold-accurate-protein-structure-prediction-accessible-to-all/">other</a> <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology">recently developed</a> deep neural networks.</p>
<p>The potential applications of de novo proteins are vast. With deep neural networks, researchers will be able to create even more proteins that can <a href="https://doi.org/10.1038/s41586-020-2149-4">break down plastics</a> to reduce environmental pollution, <a href="https://doi.org/10.1038/s41586-021-03258-z">identify and respond</a> to unhealthy cells and <a href="https://doi.org/10.1126/science.aay5051">improve vaccines</a> against existing and new pathogens – just to name a few.</p>
<p>[<em>Like what you’ve read? Want more?</em> <a href="https://theconversation.com/us/newsletters/the-daily-3?utm_source=TCUS&utm_medium=inline-link&utm_campaign=newsletter-text&utm_content=likethis">Sign up for The Conversation’s daily newsletter</a>.]</p><img src="https://counter.theconversation.com/content/173209/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Ivan Anishchenko receives funding from NSF (grant DBI 1937533) and NIAID (Federal Contract HHSN272201700059C). </span></em></p>Using a form of artificial intelligence called deep neural networks, researchers can generate new proteins from scratch without having to consult nature.Ivan Anishchenko, Acting instructor in Computational Biology, University of WashingtonLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1687182021-09-29T15:14:04Z2021-09-29T15:14:04ZThe music of proteins is made audible through a computer program that learns from Chopin<figure><img src="https://images.theconversation.com/files/423473/original/file-20210928-18-1783wgm.jpg?ixlib=rb-1.1.0&rect=0%2C0%2C962%2C579&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Training an algorithm to play proteins like Chopin can produce more melodious songs.</span> <span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:FI_CHOPIN.jpg">Frederic Chopin/Wikimedia Commons</a></span></figcaption></figure><p>With the right computer program, proteins become pleasant music.</p>
<p>There are many surprising analogies between <a href="https://theconversation.com/what-is-a-protein-a-biologist-explains-152870">proteins</a>, the basic building blocks of life, and musical notation. These analogies can be used not only to help advance research, but also to make the complexity of proteins accessible to the public.</p>
<p>We’re <a href="https://scholar.google.com.sg/citations?user=Ic2nqDsAAAAJ&hl=en">computational</a> <a href="https://scholar.google.com/citations?user=784B-f0AAAAJ&hl=en">biologists</a> who believe that hearing the sound of life at the molecular level could help inspire people to learn more about biology and the computational sciences. While creating music based on proteins <a href="https://news.mit.edu/2019/translating-proteins-music-0626">isn’t new</a>, different musical styles and composition algorithms had yet to be explored. So we led a team of high school students and other scholars to figure out how to <a href="https://doi.org/10.1016/j.heliyon.2021.e07933">create classical music from proteins</a>.</p>
<h2>The musical analogies of proteins</h2>
<p><a href="https://www.nature.com/scitable/topicpage/protein-structure-14122136/">Proteins</a> are structured like folded chains. These chains are composed of small units of 20 possible amino acids, each labeled by a letter of the alphabet. </p>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Illustration of the four levels of protein structure." src="https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=1046&fit=crop&dpr=1 600w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=1046&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=1046&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=1315&fit=crop&dpr=1 754w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=1315&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/423442/original/file-20210927-27-3uc4g3.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=1315&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Aspects of potein structure can be analogous to musical notation.</span>
<span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:Main_protein_structure_levels_en.svg">LadyofHats/Wikimedia Commons</a></span>
</figcaption>
</figure>
<p>A protein chain can be represented as a string of these alphabetic letters, very much like a string of music notes in alphabetical notation.</p>
<p>Protein chains can also fold into wavy and curved patterns with ups, downs, turns and loops. Likewise, music consists of sound waves of higher and lower pitches, with changing tempos and repeating motifs. </p>
<p>Protein-to-music algorithms can thus map the structural and physiochemical features of a string of amino acids onto the musical features of a string of notes.</p>
<h2>Enhancing the musicality of protein mapping</h2>
<p>Protein-to-music mapping can be fine-tuned by basing it on the features of a specific music style. This enhances musicality, or the melodiousness of the song, when converting amino acid properties, such as sequence patterns and variations, into analogous musical properties, like pitch, note lengths and chords.</p>
<p>For our study, we specifically selected 19th-century <a href="https://courses.lumenlearning.com/musicapp_historical/chapter/romantic-music/">Romantic period classical piano music</a>, which includes composers like Chopin and Schubert, as a guide because it typically spans a wide range of notes with more complex features such as <a href="https://hellomusictheory.com/learn/chromatic-scale/">chromaticism</a>, like playing both white and black keys on a piano in order of pitch, and chords. Music from this period also tends to have lighter and more graceful and emotive melodies. Songs are usually <a href="https://hellomusictheory.com/learn/homophonic-texture/">homophonic</a>, meaning they follow a central melody with accompaniment. These features allowed us to test out a greater range of notes in our protein-to-music mapping algorithm. In this case, we chose to analyze features of <a href="https://www.youtube.com/watch?v=Gus4dnQuiGk">Chopin’s “Fantaisie-Impromptu”</a> to guide our development of the program. </p>
<p>To test the algorithm, we applied it to 18 proteins that play a key role in various biological functions. Each amino acid in the protein is mapped to a particular note based on how frequently they appear in the protein, and other aspects of their biochemistry correspond with other aspects of the music. A larger-sized amino acid, for instance, would have a shorter note length, and vice versa.</p>
<p>The resulting music is complex, with notable variations in pitch, loudness and rhythm. Because the algorithm was completely based on the amino acid sequence and no two proteins share the same amino acid sequence, each protein will produce a distinct song. This also means that there are variations in musicality across the different pieces, and interesting patterns can emerge. </p>
<p>For example, music generated from the receptor protein that binds to the <a href="https://doi.org/10.1152/physrev.2001.81.2.629">hormone and neurotransmitter oxytocin</a> has some recurring motifs due to the repetition of certain small sequences of amino acids. </p>
<p><audio preload="metadata" controls="controls" data-duration="215" data-image="" data-title="OXTR protein music" data-size="3436911" data-source="Zhang et al." data-source-url="https://EMBARGO.com" data-license="CC BY-NC-ND" data-license-url="http://creativecommons.org/licenses/by-nc-nd/4.0/">
<source src="https://cdn.theconversation.com/audio/2282/music-oxtr.mp3" type="audio/mpeg">
</audio>
<div class="audio-player-caption">
OXTR protein music.
<span class="attribution"><a class="source" rel="nofollow" href="https://EMBARGO.com">Zhang et al.</a>, <a class="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">CC BY-NC-ND</a><span class="download"><span>3.28 MB</span> <a target="_blank" href="https://cdn.theconversation.com/audio/2282/music-oxtr.mp3">(download)</a></span></span>
</div></p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Oxytocin receptor protein structure" src="https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=299&fit=crop&dpr=1 600w, https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=299&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=299&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=376&fit=crop&dpr=1 754w, https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=376&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/423439/original/file-20210927-21-5qw03m.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=376&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">OXTR, or the oxytocin receptor, has repeating sequences of amino acids.</span>
<span class="attribution"><a class="source" href="https://alphafold.ebi.ac.uk/entry/P30559">AlphaFold Data/EMBL-EBI</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<p>On the other hand, music generated from <a href="https://medlineplus.gov/genetics/gene/tp53/">tumor antigen p53</a>, a protein that prevents cancer formation, is highly chromatic, producing particularly fascinating phrases where the music sounds almost <a href="https://www.britannica.com/art/toccata">toccata-like</a>, a style that often features fast and virtuoso technique.</p>
<p><audio preload="metadata" controls="controls" data-duration="139" data-image="" data-title="TP53 protein music" data-size="2223993" data-source="Zhang et al." data-source-url="https://PENDING EMBARGO.com" data-license="CC BY-NC-ND" data-license-url="http://creativecommons.org/licenses/by-nc-nd/4.0/">
<source src="https://cdn.theconversation.com/audio/2281/music-tp53.mp3" type="audio/mpeg">
</audio>
<div class="audio-player-caption">
TP53 protein music.
<span class="attribution"><a class="source" rel="nofollow" href="https://PENDING%20EMBARGO.com">Zhang et al.</a>, <a class="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">CC BY-NC-ND</a><span class="download"><span>2.12 MB</span> <a target="_blank" href="https://cdn.theconversation.com/audio/2281/music-tp53.mp3">(download)</a></span></span>
</div></p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Tumor protein p53 protein structure" src="https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=299&fit=crop&dpr=1 600w, https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=299&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=299&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=376&fit=crop&dpr=1 754w, https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=376&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/423441/original/file-20210927-15-vgtzsi.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=376&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">TP53, or tumor protein p53, produces chromatic music.</span>
<span class="attribution"><a class="source" href="https://alphafold.ebi.ac.uk/entry/P04637">AlphaFold Data/EMBL-EBI</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<p>By guiding analysis of amino acid properties through specific music styles, protein music can sound much more pleasant to the ear. This can be further developed and applied to a wider variety of music styles, including pop and jazz.</p>
<p>Protein music is an example of how combining the biological and computational sciences can produce beautiful works of art. Our hope is that this work will encourage researchers to compose protein music of different styles and inspire the public to learn about the basic building blocks of life.</p>
<p><em>This study was collaboratively developed with Nicole Tay, Fanxi Liu, Chaoxin Wang and Hui Zhang.</em></p>
<p>[<em>Get our best science, health and technology stories.</em> <a href="https://theconversation.com/us/newsletters/science-editors-picks-71/?utm_source=TCUS&utm_medium=inline-link&utm_campaign=newsletter-text&utm_content=science-best">Sign up for The Conversation’s science newsletter</a>.]</p><img src="https://counter.theconversation.com/content/168718/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>The authors do not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Many features of proteins are analogous to music. Mapping these features together creates new musical compositions that help researchers learn about proteins.Peng Zhang, Postdoctoral Researcher in Computational Biology, The Rockefeller UniversityYuzong Chen, Professor of Pharmacy, National University of SingaporeLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1511812020-12-02T13:28:54Z2020-12-02T13:28:54ZAI makes huge progress predicting how proteins fold – one of biology’s greatest challenges – promising rapid drug development<figure><img src="https://images.theconversation.com/files/372322/original/file-20201201-15-s2hltf.png?ixlib=rb-1.1.0&rect=5%2C2%2C973%2C431&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">A simple chain of amino acids folds into a complex three-dimensional structure.</span> </figcaption></figure><p><strong>Takeaways</strong></p>
<ul>
<li><p><strong>A “deep learning” software program from Google-owned lab DeepMind showed great progress in solving one of biology’s greatest challenges – understanding protein folding.</strong> </p></li>
<li><p><strong>Protein folding is the process by which a protein takes its shape from a string of building blocks to its final three-dimensional structure, which determines its function.</strong></p></li>
<li><p><strong>By better predicting how proteins take their structure, or “fold,” scientists can more quickly develop drugs that, for example, block the action of crucial viral proteins.</strong> </p></li>
</ul>
<hr>
<p>Solving what biologists call “the protein-folding problem” is a big deal. Proteins are the workhorses of cells and are present in all living organisms. They are made up of long chains of amino acids and are vital for the structure of cells and communication between them as well as regulating all of the chemistry in the body. </p>
<p>This week, the Google-owned artificial intelligence company <a href="https://www.deepmind.com">DeepMind</a> demonstrated a deep-learning program called <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology">AlphaFold2</a>, which experts are calling a <a href="https://www.nature.com/articles/d41586-020-03348-4">breakthrough</a> toward solving the grand challenge of <a href="https://doi.org/10.1038/d41586-020-03348-4">protein folding</a>. </p>
<p>Proteins are long chains of amino acids linked together like beads on a string. But for a protein to do its job in the cell, it must “fold” – a process of twisting and bending that transforms the molecule into a complex three-dimensional structure that can interact with its target in the cell. If the folding is disrupted, then the protein won’t form the correct shape – and it won’t be able to perform its job inside the body. This can lead to disease – as is the case in a common disease like Alzheimer’s, and rare ones like cystic fibrosis.</p>
<p>Deep learning is a computational technique that uses the often hidden information contained in vast datasets to solve questions of interest. It’s been used widely in fields such as games, speech and voice recognition, autonomous cars, science and medicine.</p>
<p>I believe that tools like AlphaFold2 will help scientists to design new types of proteins, ones that may, for example, help break down plastics and fight future viral pandemics and disease. </p>
<p><a href="https://scholar.google.com/citations?user=RpiSPiwAAAAJ&hl=en">I am a computational chemist</a> and author of the book <a href="https://rowman.com/ISBN/9781633886407/The-State-of-Science-What-the-Future-Holds-and-the-Scientists-Making-It-Happen">The State of Science</a>. My students and I study the structure and properties of <a href="https://www.conncoll.edu/ccacad/zimmer/GFP-ww/GFP-1.htm">fluorescent proteins</a> using protein-folding computer programs based on classical physics. </p>
<p>After decades of study by thousands of research groups, these protein-folding prediction programs are very good at calculating structural changes that occur when we make small alterations to known molecules. </p>
<p>But they haven’t adequately managed to predict how proteins fold from scratch. Before deep learning came along, the protein-folding problem seemed impossibly hard, and it seemed poised to frustrate computational chemists for many decades to come.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=510&fit=crop&dpr=1 600w, https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=510&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=510&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=641&fit=crop&dpr=1 754w, https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=641&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/372313/original/file-20201201-23-12msmry.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=641&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">A chain of amino acids goes through several folding steps, which occurs through hydrogen bonds between amino acids in different regions of the protein, before arriving at the final structure. The example shown here is hemoglobin, a protein in red blood cells that transports oxygen to body tissues.</span>
<span class="attribution"><a class="source" href="https://upload.wikimedia.org/wikipedia/commons/2/26/225_Peptide_Bond-01.jpg">Anatomy & Physiology, Connexions website</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<h2>Protein folding</h2>
<p>The sequence of the amino acids – which is encoded in DNA – defines the protein’s 3D shape. The shape determines its function. If the structure of the protein changes, it is unable to perform its function. Correctly predicting protein folds based on the amino acid sequence could revolutionize drug design, and explain the causes of new and old diseases. </p>
<p>All proteins with the same sequence of amino acid building blocks fold into the same three-dimensional form, which optimizes the interactions between the amino acids. They do this within milliseconds, although they have an astronomical number of possible configurations available to them – <a href="https://web.archive.org/web/20110523080407/http://www-miller.ch.cam.ac.uk/levinthal/levinthal.html">about 10 to the power of 300</a>. This massive number is what makes it hard to predict how a protein folds even when scientists know the full sequence of amino acids that go into making it. Previously predicting the structure of protein from the amino acid sequence was impossible. Protein structures were experimentally determined, a time-consuming and expensive endeavor. </p>
<p>Once researchers can better predict how proteins fold, they’ll be able to better understand how cells function and how misfolded proteins cause disease. Better protein prediction tools will also help us design drugs that can target a particular topological region of a protein where chemical reactions take place. </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/372314/original/file-20201201-23-86jeuv.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/372314/original/file-20201201-23-86jeuv.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=300&fit=crop&dpr=1 600w, https://images.theconversation.com/files/372314/original/file-20201201-23-86jeuv.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=300&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/372314/original/file-20201201-23-86jeuv.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=300&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/372314/original/file-20201201-23-86jeuv.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=377&fit=crop&dpr=1 754w, https://images.theconversation.com/files/372314/original/file-20201201-23-86jeuv.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=377&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/372314/original/file-20201201-23-86jeuv.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=377&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">What’s your move?</span>
<span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/photo/robot-hand-chessboard-royalty-free-image/1255171787?adppopup=true">style-photography/Getty Images</a></span>
</figcaption>
</figure>
<h2>AlphaFold is born from deep-learning chess, Go and poker games</h2>
<p>The success of DeepMind’s protein-folding prediction program, called <a href="https://deepmind.com/research/case-studies/alphafold">AlphaFold</a>, is not unexpected. Other deep-learning programs written by <a href="https://deepmind.com/about">DeepMind</a> have demolished the world’s best chess, Go and poker players.</p>
<p>In 2016 <a href="https://www.chessprogramming.org/Stockfish">Stockfish-8</a>, an open-source chess engine, was the world’s computer chess champion. It evaluated 70 million chess positions per second and had centuries of accumulated human chess strategies and decades of computer experience to draw upon. It played efficiently and brutally, mercilessly beating all its human challengers without an ounce of finesse. Enter deep learning. </p>
<p>On Dec. 7, 2017, Google’s deep-learning chess program <a href="http://doi.org/10.1126/science.aar6404">AlphaZero</a> thrashed Stockfish-8. The chess engines played 100 games, with AlphaZero winning 28 and tying 72. It didn’t lose a single game. AlphaZero did only 80,000 calculations per second, as opposed to Stockfish-8’s 70 million calculations, and it took just four hours to learn chess from scratch by playing against itself a few million times and optimizing its neural networks as it learned from its experience. </p>
<p><a href="https://web.stanford.edu/%7Esurag/posts/alphazero.html">AlphaZero</a> didn’t learn anything from humans or chess games played by humans. It taught itself and, in the process, derived strategies never seen before. In a <a href="https://doi.org/10.1126/science.aaw2221">commentary</a> in Science magazine, former world chess champion Garry Kasparov wrote that by learning from playing itself, AlphaZero developed strategies that “reflect the truth” of chess rather than reflecting “the priorities and prejudices” of the programmers. “It’s the embodiment of the cliché ‘work smarter, not harder.’” </p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/gg7WjuFs8F4?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
<figcaption><span class="caption">How do proteins fold?</span></figcaption>
</figure>
<h2>CASP – the Olympics for molecular modelers</h2>
<p>Every two years, the world’s top computational chemists test the abilities of their programs to predict the folding of proteins and compete in the <a href="https://predictioncenter.org">Critical Assessment of Structure Prediction</a> (CASP) competition. </p>
<p>In the competition, teams are given the linear sequence of amino acids for about 100 proteins for which the 3D shape is known but hasn’t yet been published; they then have to compute how these sequences would fold. In 2018 AlphaFold, the deep-learning rookie at the competition, beat all the traditional programs – but barely. </p>
<p>Two years later, on Monday, it was announced that Alphafold2 had won the 2020 competition by a healthy margin. It whipped its competitors, and its predictions were comparable to the existing experimental results determined through gold standard techniques like X-ray diffraction crystallography and cryo-electron microscopy. Soon I expect AlphaFold2 and its progeny will be the methods of choice to determine protein structures before resorting to experimental techniques that require painstaking, laborious work on expensive instrumentation.</p>
<p>One of the reasons for AlphaFold2’s success is that it could use the <a href="https://www.rcsb.org/">Protein Database</a>, which has over 170,000 experimentally determined 3D structures, to train itself to calculate the correctly folded structures of proteins. </p>
<p>The potential impact of AlphaFold can be appreciated if one compares the number of all published protein structures – approximately 170,000 – with the 180 million DNA and protein sequences deposited in the <a href="https://www.uniprot.org">Universal Protein Database</a>. AlphaFold will help us sort through treasure troves of DNA sequences hunting for new proteins with unique structures and <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology">functions</a>.</p>
<h2>Has AlphaFold made me, a molecular modeler, redundant?</h2>
<p>As with the chess and Go programs – AlphaZero and AlphaGo – we don’t exactly know what the AlphaFold2 algorithm is doing and why it uses certain correlations, but we do know that it works. </p>
<p>Besides helping us predict the structures of important proteins, understanding AlphaFold’s “thinking” will also help us gain new insights into the mechanism of protein folding.</p>
<p>[<em>Deep knowledge, daily.</em> <a href="https://theconversation.com/us/newsletters/the-daily-3?utm_source=TCUS&utm_medium=inline-link&utm_campaign=newsletter-text&utm_content=deepknowledge">Sign up for The Conversation’s newsletter</a>.]</p>
<p>One of the most common fears expressed about AI is that it will lead to large-scale unemployment. AlphaFold still has a significant way to go before it can consistently and successfully predict protein folding. </p>
<p>However, once it has matured and the program can simulate protein folding, computational chemists will be integrally involved in improving the programs, trying to understand the underlying correlations used, and applying the program to solve important problems such as the protein misfolding associated with many diseases such as Alzheimer’s, Parkinson’s, cystic fibrosis and Huntington’s disease. </p>
<p>AlphaFold and its offspring will certainly change the way computational chemists work, but it won’t make them redundant. Other areas won’t be as fortunate. In the past robots were able to replace humans doing manual labor; with AI, our cognitive skills are also being challenged.</p><img src="https://counter.theconversation.com/content/151181/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Marc Zimmer does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Scientists in an artificial intelligence lab have made a breakthrough in solving the problem of how proteins fold into their final three-dimensional shape. The work could speed up creation of drugs.Marc Zimmer, Professor of Chemistry, Connecticut CollegeLicensed as Creative Commons – attribution, no derivatives.