OpenAI Reveals Impressive AI That Generates Photorealistic Video

OpenAI is teasing the capabilities of its new video generator — and honestly, it looks pretty impressive.

In a post on X-formerly-Twitter, CEO Sam Altman introduced the text-to-video model, which he said can "create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions."

It's unclear what emotions are supposed to be on display in the first of the generated videos Altman shared, given that it's of a couple with their backs to the "camera" walking through a snowy Tokyo street. Nevertheless, the video is very quite lifelike, and effectively reflects its detailed prompt.

Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

Prior attempts at AI video generation have had a mixed track record. Last month, Google released videos from "Lumiere," a text-to-video model that's better than what came before it, but still clearly limited.

The same can't be said for what we've seen so far of Sora, which is clearly miles ahead of Lumiere.

In Altman's thread and on OpenAI's website, videos generated by Sora display multiple scenes in vivid detail, from photorealistic wooly mammoths and a sci-fi movie trailer to an animated fluffy monster and a "gorgeously rendered papercraft world of a coral reef." Though it's unclear whether the videos in the CEO's thread were edited, those on the website, which feature the California gold rush and a tour of an art gallery among several other scenes, were according to OpenAI "were generated directly by Sora without modification."

There are open questions, of course. How many videos did OpenAI generate, picking just the best ones for the reveal? And how much computing power, time and electricity did it take to create these samples?

OpenAI also admits that Sora, in its current state, "has weaknesses."

"It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect," the website reads. "For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark."

Those caveats in mind, Sora isn't currently available to the public for a different reason. As Altman noted, the company's misinformation and extremism experts are still "adversarially testing" — which is industry slang for intentionally trying to jailbreak — the text-to-video generator.

"We’ll be taking several important safety steps ahead of making Sora available in OpenAI’s products," the company's website reads. "We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora."

Reading between the lines, it seems that the firm is looking to avoid its own past mistakes and those of its competitors who released their models before making sure they won't, you know, spew out a bunch of hateful lies.

"We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology," the website reads. "Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it."

Post-Thanksgiving massacre, it seems that OpenAI may well be starting a new chapter — and if these examples are any indication, it's going to become more powerful than ever.

More on OpenAI: OpenAI Hiring Detective to Find Who's Leaking Its Precious Info

Share This Article