OpenAI Has Created an AI Model For Longevity Science


OpenAI has developed a language model designed for engineering proteins, capable of converting regular cells into stem cells. It marks the company’s first venture into biological data and demonstrates AI’s potential for unexpected scientific discoveries. An anonymous reader quotes a report from MIT Technology Review:
Last week, OpenAI CEO Sam Altman said he was “confident” his company knows how to build an AGI, adding that “superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own.” The protein engineering project started a year ago when Retro Biosciences, a longevity research company based in San Francisco, approached OpenAI about working together. That link-up did not happen by chance. Sam Altman, the CEO of OpenAI, personally funded Retro with $180 million, as MIT Technology Review first reported in 2023. Retro has the goal of extending the normal human lifespan by 10 years. For that, it studies what are called Yamanaka factors. Those are a set of proteins that, when added to a human skin cell, will cause it to morph into a young-seeming stem cell, a type that can produce any other tissue in the body. […]

OpenAI’s new model, called GPT-4b micro, was trained to suggest ways to re-engineer the protein factors to increase their function. According to OpenAI, researchers used the model’s suggestions to change two of the Yamanaka factors to be more than 50 times as effective — at least according to some preliminary measures. […] The model does not work the same way as Google’s AlphaFold, which predicts what shape proteins will take. Since the Yamanaka factors are unusually floppy and unstructured proteins, OpenAI said, they called for a different approach, which its large language models were suited to. The model was trained on examples of protein sequences from many species, as well as information on which proteins tend to interact with one another. While that’s a lot of data, it’s just a fraction of what OpenAI’s flagship chatbots were trained on, making GPT-4b an example of a “small language model” that works with a focused data set.

Once Retro scientists were given the model, they tried to steer it to suggest possible redesigns of the Yamanaka proteins. The prompting tactic used is similar to the “few-shot” method, in which a user queries a chatbot by providing a series of examples with answers, followed by an example for the bot to respond to. Although genetic engineers have ways to direct evolution of molecules in the lab, they can usually test only so many possibilities. And even a protein of typical length can be changed in nearly infinite ways (since they’re built from hundreds of amino acids, and each acid comes in 20 possible varieties). OpenAI’s model, however, often spits out suggestions in which a third of the amino acids in the proteins were changed. “We threw this model into the lab immediately and we got real-world results,” says Retro’s CEO, Joe Betts-Lacroix. He says the model’s ideas were unusually good, leading to improvements over the original Yamanaka factors in a substantial fraction of cases.



Source link