The Holodeck is Here - New AI 3D Video Models Create Animations of Any Object

The Holodeck is Here - New AI 3D Video Models Create Animations of Any Object

StabilityAI has released a new artificially intelligent 3D video model that can turn a simple image prompt into a fully animated view of any object or series of objects.

Built on top of the open source Stable Video Diffusion model, it is widely used for AI video generation by companies like Leonardo AI and StabilityAI itself.

Stable Video 3D (SV3D) adds new depth to video generation, creating multi-view 3D meshes from a single image while maintaining higher consistency for objects in the video frame.

Emad Mostaque, founder and CEO of StabilityAI, writes in X: "The 3D mesh is a new way of creating a 3D mesh. All pixels are generated.

Stable Video 3D builds on technology pioneered in previous models such as Stable Video Diffusion, the original Stable Diffusion, and the Zero123 3D image model that StabilityAI released late last year.

At the time, Mostaque said this was just the first in a series of 3D models to be released by AI Labs, but they seem to be on a mission to make the Star Trek holodeck a reality.

There are two variants of the new model: the first is SV3D_u, which creates an orbital movie based on a single image input, without specific camera specifications.

The second, SV3D_p, builds on the functionality of the first, allowing for a single image and orbit view, leading to the creation of a 3D video "captured" along a specified camera path.

Basically, it analyzes a given image and creates multiple views of the object from different angles as if the camera were moving around that object, which then becomes a video.

I have not yet tried SVD 3D, but from the sample clips, it seems to do a good job of capturing the object and predicting unseen views as well as camera movement.

So far, all clips have focused on a single object on a white background. While this may prove useful for companies that want to easily include a full 360-degree view on their products, the authenticity of the reverse view is questionable since it is predicted, not real.

It will be interesting to see how it evolves to handle more complex images and whether the camera controls can be applied to complete scenes, such as two people talking or a car spinning around on the road.

Some of the motion depiction techniques could be extended and applied to generative AI video to provide a higher degree of control over how the camera moves in a clip.

They could also be used to create 3D videos of interactive objects or objects in virtual environments such as Meta Quest or Apple Vision Pro.

The source of training data is a particularly important topic that many large AI labs are reluctant to discuss. This includes OpenAI, which is debating over whether YouTube videos are part of Sora AI's video dataset.

StabilityAI has been open about the source of training data for its latest models, explaining that they are trained on a curated subset of the Objaverse dataset. This is a library of millions of annotated 3D objects used by many AI 3D services.

"We selected a curated subset of the Objaverse dataset as our training data.

The license allows end users to share, adapt, and remix the material in any way they like, commercial or non-commercial, as long as they show credit and a link to the license.

Categories