Google’s DeepMind division has rolled out a few fascinating and impressive AI models, including one that can play StarCraft II better than you (and almost anyone else). DeepMind isn’t only interested in AI to play games though. Last year, the company unveiled AlphaFold, a machine learning model that can predict the shape of proteins. Now, DeepMind has announced that it has generated structures for all 200+ million proteins in the centralized UniProt database. This is a big deal for basic biological research as well as for efforts to tackle some of the most important scientific conundrums of our time.
Proteins are the basis of all biological life on Earth, but even if you know the amino acid sequence of a protein, that doesn’t mean you know what it does or how it works. The sequence of a protein gives it patterns of positive and negative charges, hydrophilic and hydrophobic regions, and cross-linked segments. This is what determines the protein’s active shape, or “conformation” as it’s known in the lab, and a protein’s conformation is what gives it its function. Even a few mistakes in the structural prediction can be the difference between an enzyme that correctly catalyzes a reaction and one that does literally nothing.
Determining the conformation can be a painstaking process, often relying on advanced techniques like X-ray crystallography. AlphaFold helps put that data in context with highly accurate conformation predictions. In the video below, you can see a team from the University of Colorado, Boulder talking about the challenges of studying proteins involved in bacterial resistance to antibiotics. The team spent ten years puzzling over the shape of a protein that AlphaFold was able to predict in just a few minutes. That’s possible because AlphaFold has been trained on over 170,000 known protein structures, giving it the ability to predict what new sequences will look like in 3D.
When DeepMind announced AlphaFold last year, it decided to make the AlphaFold database freely accessible. At the time, there were only one million structures available, making the 200-fold increase over the past 12 months quite impressive. DeepMind says AlphaFold has been cited in more than 4,000 scientific papers since its debut, and it could help scientists understand pressing issues like antibiotic resistance, food security, and the effects of plastic pollution.
With the entire UniProt database now done, DeepMind will provide a predicted sequence right on the web page. The full database of all 200 million structures will also be available as a bulk download from a Google Cloud Public Database.