OpenAI surprises again – AI Sora turns text queries into photorealistic videos

OpenAI has announced a new video generation model called Sora. The company says Sora is “capable of creating photorealistic scenes based on text queries” and demo videos look better than any previous model designed for this purpose. Currently, the text-to-video model allows users to create videos up to a minute long based only on the prompts they enter.


According to OpenAI, Sora is capable of creating “complex scenes with multiple characters, specific types of motion, and precise object and background details.” The company also notes that the model can understand the physics of objects in the real world, as well as “accurately interpret props and create convincing characters with powerful emotions.”

The model can generate video based on a single frame, as well as fill missing frames or extend an existing video. Examples on the OpenAI blog include a California gold rush scene, a view from a train window in Tokyo, and others. Many feature characteristic AI artifacts, such as a suspiciously moving floor in a museum. Overall, the results are impressive.

It wasn’t long ago that the leaders in image-to-text generation were models like Midjourney, but video quality has been rapidly improving lately, with companies like Runway and Pika showing off impressive text-to-video models, and Google’s Lumiere promising to be one of Sora’s main competitors. Like Sora, Lumiere provides text-based video generation tools and can create videos from a single frame.

For now, access to Sora is only available to a select few who evaluate the model for potential harm, as well as some artists and filmmakers. OpenAI notes that the current model may not correctly simulate the physics of complex scenes and misinterpret cause-and-effect relationships.

You can read in full about Sora on the company’s blog.



