It’s only been a couple of days since OpenAI released Sora, its new text-to-video model, which generates realistic videos. So far, it is available exclusively to red teamers for identifying potential issues and risks, as well as to artists, designers, and filmmakers for collecting feedback on improvements.“We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon,” the blog said.

Yet already, there are videos on social media that’ll make you stop to find out more. The videos, up to a minute long, maintain incredible visual quality and adherence to the user’s prompt.

The model can simulate complex scenes, featuring multiple characters, specific motions, and intricate details of the subject and background. Sora uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. 

Here are some of the best examples of what Sora can create:  

A bicycle race on the ocean

As soon as the model was released, Sam Altman was taking requests from users on X to create anything they’d like to see. Kunal Shah, the founder of CRED replied with the prompt “A bicycle race on the ocean with different animals as athletes riding the bicycles with a drone camera view”

His request was fulfilled with a video that features a shark, orca, penguin, turtle, and dolphin all racing on bicycles on water, as if by magic!

White SUV on a dirt road

Jessica Lessin, the editor in chief of The Information, has a favourite video from the ones initially previewed by OpenAI. It’s been created with the prompt, “The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from its tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.”

The video follows the prompts to the T. 

Pirates clash inside a coffee cup

Jim Fan, the Sr. Research Scientist & Lead of AI Agents at NVIDIA broke down the data-driven physics engine of Sora on X.  He used the video that came out of the prompt “Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.”

He explained that the model is likely trained on synthetic data, possibly using Unreal Engine 5. The video animates the prompt realistically, simulating fluid dynamics like coffee movement, achieving photorealism, and applying real-world physics to fantastical scenarios. 

Golden Retrievers and Snow

Srinivas Mohan, Indian visual effects designer, coordinator and supervisor who notably worked on  Tamil, Telugu and Malayalam films like RRR, Bahubali, Enthiran posted on X an adorable close up video of three golden retrievers who after plonking their heads in the snow continued to play in it. This accurately captured the prompt, “A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in it.” 

A garden cat

Ben Nash, a full stack web designer and developer, posted a video on X a video of an orange tabby cat exploring a garden. The detailed prompt read, “ A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. The scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field.”

He then went on to compare the same prompt on Pika, Runway, Leonardo, FinalFrame and shared the comparative results. And they didn’t even come close!

Surfing indoors

The prompt reads, “In an ornate, historical hall, a massive tidal wave peaks and begins to crash. Two surfers, seizing the moment, skillfully navigate the face of the wave.” This bizarre request was stunningly captured by the model which was posted by the user Angry Penguin on X. He posted the video along with a few other equally bizarre options. 

This Inception-esq video gives Hollywood a run for their money. 

Snow and Sakura

Instead of planning a trip to Japan or even looking at pictures taken by travellers, Sora will give you a cinematic view of the scenery with a panoramic shot. 

OpenAI’s demo which prompted, “Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.” 

A stroll in Mumbai rains

Anu Aakash an AI content creator posted a video on X of a woman with bright purple overalls and cowboy boots taking a casual stroll in the streets of Mumbai during the monsoon season.  Unsurprisingly that exactly was the prompt given to Sora. The image depth with a car moving towards the viewer with old Bombay buildings really captures the prompt. 

Minecraft with Sora

In 2022 OpenAI trained a neural network to play minecraft. This time Sora is used to create a video of the game. Jim Fan posted the video on X with the title “Minecraft has been achieved internally” The video however accurate has an almost realistic sky which he pointed out saying, “It can’t resist the urge to make the sky look less pixelated 😅”

Attenborough and Sora

Debarghya Das, the head of product at Glean Assistant combined multiple AI tools to make a stunning video of sea creatures floating in the sky above a town. To this video he added David Attenborough’s voice on Eleven Labs, and sampled some nature music from Youtube on iMovie. The video transitions from a regular town to a magical one ‘through a portal,’ to show a stunning transformation. 

He pointed out that, “People aren’t taking the ‘everyone will be filmmakers’ seriously enough”.