It was only in 1957 that scientists gained special access to the molecular third dimension.
After 22 years of painstaking research, John Kendrew of the University of Cambridge finally discovered the 3D structure of the protein. This was the folded blueprint of myoglobin, the 154 amino acids that help oxygenate our muscles. As revolutionary as this discovery was, Kendrew didn’t exactly open the floodgates of protein architecture. More than a dozen will be identified over the next decade.
Today marks 65 years since the Nobel Prize-winning breakthrough.
On Thursday, Google’s sister company DeepMind announced that it has successfully used artificial intelligence to predict the 3D structures of nearly all cataloged proteins known to science. That’s more than 200 million proteins found in plants, bacteria, animals, humans—just about anything you can imagine.
“Basically, you can think of it as covering the entire protein universe,” DeepMind founder and CEO Demis Hassabis told reporters this week.
It has an open-source database thanks to DeepMind’s AI-based system, AlphaFold, so scientists all over the world can use it for their research anytime and for free. Since AlphaFold’s official launch in July of last year—when it identified only 350,000 3D proteins—the software has made a noticeable dent in the research landscape.
“More than 500,000 researchers and biologists have used the database to view more than 2 million structures,” Hassabis said. “And these predictive structures have helped scientists make brilliant new discoveries.”
In April, for example, scientists at Yale University called on AlphaFold’s database to help them develop a new, highly effective malaria vaccine. In July last year, scientists from the University of Portsmouth used the system to develop enzymes to combat single-use plastic pollution.
John McGeehan, director of Portsmouth’s Center for Enzyme Innovation and the researcher behind the latest study, told the New York Times: “It put us a year, maybe two years ahead of where we were.”
These efforts are just a small sampling of AlphaFold’s latest capabilities.
“In the last year alone, more than a thousand scientific papers on a wide range of research topics using AlphaFold structures have been published; I’ve never seen anything like this,” Sameer Velankar, a DeepMind fellow and team leader at the European Molecular Biology Laboratory’s Protein Data Bank, said in a press release.
According to Hassabis, those using the database include those trying to improve our understanding of Parkinson’s disease, people hoping to preserve the health of honey bees, and even those seeking valuable insight into human evolution.
“AlphaFold is already changing the way we think about the survival of molecules in the fossil record, and I can see it soon becoming a key tool for researchers not only in evolutionary biology, but also in archeology and other paleo-sciences,” said Beatrice, who recently used the system in research on the ancient egg controversy. Associate Professor Demarchi of the University of Turin said in a press release.
In the coming years, DeepMind also plans to collaborate with teams from the Neglected Diseases Initiative and the World Health Organization to find treatments for understudied but widespread tropical diseases such as Chagas disease and Leishmaniasis.
“It’s going to make a lot of researchers around the world think about what experiments they can do,” Ewan Birney, a DeepMind associate and EMBL deputy director, told reporters. “And think about what happens in organisms and the systems they study.”
Locks and keys
So why do so many scientific advances depend on this treasure chest of 3D protein modeling? Let us explain.
Let’s say you’re trying to make a key that fits a lock perfectly. But you don’t have the ability to view the structure of this lock. All you need to know is that this lock exists, some information about its materials, and possibly digital information about how big each ridge is and where those ridges should be.
It might not be impossible to develop this key, but it would be quite difficult. The keys must be exact, otherwise it does not work. Therefore, before you start, you will probably try to model several different fake locks with whatever information you have so that you can make your key.
In this analogy, the lock is a protein and the key is a small molecule that binds to this protein.
For scientists, whether they are doctors trying to develop new drugs or botanists dissecting plant anatomy to make fertilizer, interactions between certain molecules and proteins are crucial.
With drugs, for example, the specific way a molecule in a drug binds to a protein can be the tipping point as to whether it works. This interaction is complicated because, although proteins are simply strings of amino acids, they are not straight or even. They inevitably fold, bend, and sometimes get tangled around themselves like the headphone wires in your pocket.
In fact, a protein’s unique folding dictates how it functions – and even the smallest folding errors in the human body can lead to disease.
But when it comes to small molecule drugs, sometimes parts of a folded protein are protected from binding a drug. They can, for example, fold in a strange way that makes them inaccessible. Things like this are very important information for scientists trying to glue together drug molecules. “I think it’s true that almost every drug that has come on the market in the last few years has been designed in part from knowledge of protein structures,” Janet Thornton, a research scientist at EMBL, said at the conference.
That’s why researchers typically spend an incredible amount of time and effort deciphering the folded, 3D structure of the protein they’re working with so you can start your key-making journey by assembling the lock mold. If you know the exact structure, it will be very easy to tell where and how a molecule will bind to a particular protein, and how that binding might affect the folding of the protein in response.
But this endeavor is not simple. Or cheap.
“It costs $100,000 to solve a new, unique structure,” said Steve Darnell, a structural and computational biologist at the University of Wisconsin and researcher at the bioinformatics company DNAStar.
Because the solution usually comes from him great complex laboratory experiments.
For example, Kendrew used a technique called X-ray crystallography at the time. Basically, this method requires you to take solid crystals of the protein of interest, expose them to an X-ray beam, and watch what pattern the beam produces. This model is almost the position thousands of atoms within the crystal. Only then can you use the sample to reveal the structure of the protein.
There is also a newer technique known as cryo-electron microscopy. It is similar to X-ray crystallography, except that the protein sample is directly bombarded with electrons instead of X-rays. And although it is considered a higher resolution than other techniques, it cannot fully penetrate everything. In addition, some in the field of technology have attempted to digitally create protein folding structures. But like several attempts in the 80s and 90s, the first attempts did not go well. As you can imagine, laboratory methods are tedious and difficult.
Over the years, such barriers have led to the so-called “protein folding problem.” Scientists simply don’t know how proteins fold, and they’ve faced significant hurdles to overcome.
AlphaFold’s AI could be a game changer.
Solving the ‘Folding Problem’
In short, AlphaFold was trained by DeepMind engineers to predict protein structures without requiring laboratory involvement. No crystals, no electron fire, no $100,000 experiments.
To get AlphaFold to where it is today, the system was first exposed to 100,000 known protein folding structures, according to the company’s website. Then, over time, he began to learn how to decipher the rest.
It really is that simple. (Well, except for the talent that goes into coding the AI.)
“I don’t know, it takes at least $20,000 and a lot of time to crystallize a protein,” Birney said. “That means experimentalists have to make choices about what they do — AlphaFold doesn’t have to make choices yet.” This feature of AlphaFold’s comprehensiveness is quite fascinating. What this means is that scientists have more freedom to guess and test, follow instinct or gut instinct, and cast a wide net in their research when it comes to protein structures. They won’t have to worry about costs or deadlines.
“Models also come with prediction error,” said Jan Kosinski, a DeepMind fellow and structural modeler at EMBL in Hamburg, Germany. “And usually—in fact in many cases—the error is really small. That’s why we call it subatomic precision.”
In addition, the DeepMind team said it conducted various risk assessments to make sure AlphaFold is safe and ethical to use. Members of the DeepMind team also suggested that AI in general may carry biosecurity risks that we hadn’t thought to assess before — especially as such technology continues to permeate the medical space.
But as the future evolves, the DeepMind team says AlphaFold will adapt and address such concerns in each case. For now, it works—with a universe of protein models stretching back to a modest portrait of myoglobin.
“Just two years ago,” Birney said, “we just didn’t realize it was possible.”
at 6:45 a.m. PT: Janet Thornton’s surname and first name have been clarified.