Meta’s Movie Gen looks like a huge leap forward for AI video (but you can’t use it yet)

At this point, you probably either love the idea of making realistic videos with generative AI, or you think it’s a morally bankrupt endeavor that devalues artists and will usher in a disastrous era of deepfakes we’ll never escape from. It’s hard to find middle ground. Meta isn’t going to change minds with Movie Gen, its latest video creation AI model, but no matter what you think of AI media creation, it could end up being a significant milestone for the industry.

Movie Gen can produce realistic videos alongside music and sound effects at 16 fps or 24 fps at up to 1080p (upscaled from 768 by 768 pixels). It can also generative personalized videos if you upload a photo, and crucially, it appears to be easy to edit videos using simple text commands. Notably, it can also edit normal, non-AI videos with text. It’s easy to imagine how that could be useful for cleaning up something you’ve shot on your phone for Instagram. Movie Gen is just purely research at the moment —Meta won’t be releasing it to the public, so we have a bit of time to think about what it all means.

The company describes Movie Gen as its “third wave” of generative AI research, following its initial media creation tools like Make-A-Scene, as well as more recent offerings using its Llama AI model. It’s powered by a 30 billion parameter transformer model that can make 16 second-long 16 fps videos, or 10-second long 24 fps footage. It also has a 13 billion parameter audio model that can make 45 seconds of 48kHz of content like “ambient sound, sound effects (Foley), and instrumental background music” synchronized to video. There’s no synchronized voice support yet “due to our design choices,” the Movie Gen team wrote in their research paper.

Meta says Movie Gen was initially trained on “a combination of licensed and publicly available datasets,” including around 100 million videos, a billion images and a million hours of audio. The company’s language is a bit fuzzy when it comes to sourcing — Meta has already admitted to training its AI models on data from every Australian user’s account, it’s even less clear what the company is using outside of its own products.

As for the actual videos, Movie Gen certainly looks impressive at first glance. Meta says that in its own A/B testing, people have generally preferred its results compared to OpenAI’s Sora and Runway’s Gen3 model. Movie Gen’s AI humans look surprisingly realistic, without many of the gross telltale signs of AI video (disturbing eyes and fingers, in particular).

“While there are many exciting use cases for these foundation models, it’s important to note that generative AI isn’t a replacement for the work of artists and animators,” the Movie Gen team wrote in a blog post. “We’re sharing this research because we believe in the power of this technology to help people express themselves in new ways and to provide opportunities to people who might not otherwise have them.”

It’s still unclear what mainstream users will do with generative AI video, though. Are we going to fill our feeds with AI video, instead of taking our own photos and videos? Or will Movie Gen be deconstructed into individual tools that can help sharpen our own content? We can already easily remove objects from the backgrounds of photos on smartphones and computers, more sophisticated AI video editing seems like the next logical step.

Source link