AI predicts protein structure

Dan Sfera
3 min readJul 27, 2021

Building Block Breakthrough

Proteins are essential building blocks of living organisms. Every human cell is replete with them.

While the understanding of the shapes of proteins is important for making medical advances, only a fraction of these had been deciphered until recently. The ability to use artificial intelligence (AI) to predict the structures of almost every protein made by the human body could help to accelerate the discovery of new drugs to treat disease.

A program called AlphaFold can predict the structures of 350,000 proteins belonging to humans and other organisms. Instructions for making human proteins are contained in genomes, the DNA contained in the nuclei of human cells. About 20,000 proteins are expressed by the human genome. Biologists call this full complement the “proteome.”

Discussing the results from AlphaFold, Dr. Demis Hassabis, chief executive and co-founder of artificial intelligence company Deep Mind, said, “We believe it’s the most complete and accurate picture of the human proteome to date. We believe this work represents the most significant contribution AI has made to advancing the state of scientific knowledge, and I think it’s a great illustration and example of the kind of benefits AI can bring to society. We’re just so excited to see what the community is going to do with this.”

The shape of a protein determines its function in the human body. Hassabis related that the 350,000 protein structures predicted by AlphaFold include the 20,000 contained in the human proteome as well as those of model organisms used in scientific research, such as E. coli, yeast, the fruit fly and the mouse. This capability is described by DeepMind researchers and a team from the European Molecular Biology Laboratory (EMBL) in Nature.

AlphaFold made a confident prediction of the structural positions for 58 percent of the amino acids in the human proteome. The positions of 35.7 percent were predicted with a very high degree of confidence, twice as many as those confirmed by experiments. While traditional techniques to work out protein structures include X-ray crystallography, cryogenic electron microscopy (Cryo-EM) and others, none of these is easy to do. According to Prof John McGeehan, a structural biologist at the University of Portsmouth, “It takes a huge amount of money and resources to do structures.”

Protein shapes are often determined as part of targeted scientific investigations, but no project had previously determined structures for all the proteins made by the body. Only 17 percent of the proteome is covered by a structure confirmed experimentally.

Prof. McGeehan said, “It’s just the speed — the fact that it was taking us six months per structure and now it takes a couple of minutes. We couldn’t really have predicted that would happen so fast. When we first sent our seven sequences to the DeepMind team, two of those we already had the experimental structures for. So we were able to test those when they came back. It was one of those moments — to be honest — where the hairs stood up on the back of my neck because the structures [AlphaFold] produced were identical.”

According to Prof. Edith Heard from EMBL, “This will be transformative for our understanding of how life works. That’s because proteins represent the fundamental building blocks from which living organisms are made. The applications are limited only by our understanding.”

--

--