This could be epic.
Google's DeepMind neural network has demonstrated that it can dream up short videos from a single image frame, and it's really cool to see how it works.
As DeepMind noted on Twitter, the artificial intelligence model, named "Transframer" — that's a riff on a "transformer," a common type of AI tool that whips up text based on partial prompts — "excels in video prediction and view synthesis," and is able to "generate 30 [second] videos from a single image."
Transframer is a general-purpose generative framework that can handle many image and video tasks in a probabilistic setting. New work shows it excels in video prediction and view synthesis, and can generate 30s videos from a single image: https://t.co/wX3nrrYEEa 1/ pic.twitter.com/gQk6f9nZyg
— DeepMind (@DeepMind) August 15, 2022
As the Transframer website notes, the AI makes its perspective videos by predicting the target images' surroundings with "context images" — in short, by correctly guessing what one of the chairs below would look like from different perspectives based on extensive training data that lets it "imagine" an actual object from another angle.
This model is especially impressive because it appears to be able to apply artificial depth perception and perspective to generate what the image would look like if someone were to "move" around it, raising the possibility of entire video games based on machine learning tech instead of traditional rendering.
More food for thought: one Twitter user has already said that he plans to use Transframer in conjunction with outputs from OpenAI's DALL-E image generating algorithm — a very cool example of the kind of AI-on-AI action we'll likely be seeing a lot more of in the years to come.
GIF via DeepMind
READ MORE: Transframer: Arbitrary Frame Prediction with Generative Models [arXiv]