OpenAI’s Sora 2 lets users insert themselves into AI videos with sound
1 day ago / Read about 18 minute
Source:ArsTechnica
Sora social app launches with deepfake-style "cameos" and feed controls.


Credit: OpenAI

On Tuesday, OpenAI announced Sora 2, its second-generation video-synthesis AI model that can now generate videos in various styles with synchronized dialogue and sound effects, which is a first for the company. OpenAI also launched a new iOS social app that allows users to insert themselves into AI-generated videos through what OpenAI calls "cameos."

OpenAI showcased the new model in an AI-generated video that features a photorealistic version of OpenAI CEO Sam Altman talking to the camera in a slightly unnatural-sounding voice amid fantastical backdrops, like a competitive ride-on duck race and a glowing mushroom garden.

Regarding that voice, the new model can create what OpenAI calls "sophisticated background soundscapes, speech, and sound effects with a high degree of realism." In May, Google's Veo 3 became the first video-synthesis model from a major AI lab to generate synchronized audio as well as video. Just a few days ago, Alibaba released Wan 2.5, an open-weights video model that can generate audio as well. Now OpenAI has joined the audio party with Sora 2.

OpenAI demonstrates Sora 2's capabilities in a launch video.

The model also features notable visual consistency improvements over OpenAI's previous video model, and it can also follow more complex instructions across multiple shots while maintaining coherency between them. The new model represents what OpenAI describes as its "GPT-3.5 moment for video," comparing it to the ChatGPT breakthrough during the evolution of its text-generation models over time.

Sora 2 appears to demonstrate improved physical accuracy over the original Sora model from February 2024, with OpenAI claiming the model can now simulate complex physical movements like Olympic gymnastics routines and triple axels while maintaining realistic physics. Last year, shortly after the launch of Sora 1 Turbo, we saw several notable failures of similar video-generation tasks that OpenAI claims to have addressed with the new model.

"Prior video models are overoptimistic—they will morph objects and deform reality to successfully execute upon a text prompt," OpenAI wrote in its announcement. "For example, if a basketball player misses a shot, the ball may spontaneously teleport to the hoop. In Sora 2, if a basketball player misses a shot, it will rebound off the backboard."

An AI-generated video of vikings with sound, created using OpenAI Sora 2.

An AI-generated video of vikings with sound, created using OpenAI Sora 2.

We haven't had a chance to evaluate Sora 2 yet, but we will likely test it in a future article. Past experience with video-synthesis models suggests caution about claims of building "world models" that accurately model physics. Despite marketing language about modeling reality, these remain Transformer-based AI models that fundamentally work by pattern-matching training examples to produce outputs, however novel those outputs may appear.

However, with enough video examples and high-quality training techniques, a video-synthesis model can likely build what we once called an "illusion of understanding" that is convincing enough to visually simulate a large portion of reality in various situations without actually "understanding" physics, so to speak.

OpenAI itself acknowledges that Sora 2 "makes plenty of mistakes" but views the model as validation that scaling neural networks on video data will bring the company closer to its goal of simulating reality. The company positions Sora 2 as progress toward "general-purpose world simulators and robotic agents" that it believes will "fundamentally reshape society."

A different approach to social media

Aside from visual and auditory upgrades, OpenAI is taking another big step away from its AI research lab pedigree toward making the new model available to average people in an easy-to-use way. It's doing it by packaging Sora 2 into a social iOS app that focuses on creating and sharing AI-generated content.

That new iOS app has already launched in the US and Canada as an invite-based rollout, with plans to expand to additional countries. Users can sign up in the app for notifications when access becomes available for their account. The service will initially be free with what OpenAI describes as "generous limits," though the company plans to offer paid options for additional generations when demand exceeds available compute resources.

Using the app, users can create videos, remix content from other users, and browse a customizable feed of generated videos. As mentioned above, the app's Cameo feature allows users to essentially deepfake themselves by recording a one-time video and audio capture, which the model can then insert into any Sora-generated scene.

An AI-generated video of a gymnast doing flips, which is a notable improvement over Sora 1.

An AI-generated video of a gymnast doing flips, which is a notable improvement over Sora 1.

In addition to the basic Sora 2 model on the website and in the app, ChatGPT Pro subscribers will gain access to Sora 2 Pro, described as an experimental higher-quality model. OpenAI also plans to release Sora 2 through its API for developers. The older Sora 1 Turbo model will remain available, and existing creations will stay in users' Sora libraries.

New challenges ahead

So, what could go wrong with an app that can easily put people into AI-generated videos? Well, just about everything. Battling misuse is likely going to be a tricky issue for the company. In the recent past, we've seen instances of AI deepfaking (not related to OpenAI) without consent that have led to bullying lawsuits, criminal penalties, and suicides.

OpenAI is taking precautions. Given recently prominent corporate sensitivities following a ChatGPT user's suicide, OpenAI says Sora 2 includes specific protections for teenage users. Those include default daily-generation limits and strict permissions for the cameos feature. OpenAI says it has deployed both automated safety systems and human moderators to review potential cases of bullying or misuse.

In particular, OpenAI has built in layers of security for the cameos feature. It says that users can maintain control over their uploaded likeness: They can decide who can use their cameo in videos and can revoke access or remove videos containing their likeness at any time. Users can also view all videos containing their cameo, including drafts created by other people.

Beyond deepfakes, the new Sora app has another hurdle to clear. These days, social media is often seen as less than a positive thing due to its perceived broad effects on society. Perhaps reacting to this stigma, OpenAI claims it has designed the new app to avoid common social media pitfalls like doomscrolling and addiction with what it calls a "new class of recommender algorithms" that users can control through natural language instructions, rather than relying on traditional engagement metrics.

"We are not optimizing for time spent in feed, and we explicitly designed the app to maximize creation, not consumption," OpenAI stated in its announcement.