TechingToday

Stability AI's new image-to-3D conversion tool is tried and tested.

General

StabilityAI, makers of the Stable Diffusion family of AI image models, have announced TripoSR, a new image-3D tool that can quickly transform images into objects.

Generative 3D models continue to grow, but what makes TripoSR stand out is the speed with which it creates new objects and the ability to run on a laptop.

On my M2 MacBook Air, I was able to get the model running in about 10 minutes using Pinokio's 1-click installer. Generating an object from a simple image took about 1 minute.

Using a cloud version of the AI model, other users were able to run it inside the Apple Vision Pro and generate 3D objects from photos and load them as interactive objects without removing the headset.

TripoSR is the result of a partnership between StabilityAI and Tripo AI, an AI-powered 3D modeling startup from VAST AI Research.

The tool can take any image, remove the background, and transform it into a fully rendered 3D object that can be interacted with.

The image serves as the basis for the 3D reconstruction. Via pre-trained encoders, they are transformed into vectors with global and local features of the image.

They have the information needed to generate 3D objects; TripoSR is trained to "guess" this information during training, so no additional input such as camera parameters or position is required.

This is why it is fast to generate, but also why the backsides of the generated models sometimes lack detail.

The models are fun and reasonably high-resolution, but in my tests I struggled with the backs of the models, often coming up blank. The most impressive development, however, is the speed of generation. [On my Mac, it generates obj files in 30 seconds to a minute, and on a machine with an NVIDIA H100 Tensor Core GPU, it apparently generates files in 0.5 seconds from an image.

The object is interactive and, once the appropriate starting image is selected, is better suited to be converted into a 3D object than other tools, including one that uses a cell phone to perform a complete 3D lidar scan.

This near real-time generation of a single object leads to the creation of a true virtual world on the fly, potentially creating games that change in response to user interaction.

If realized within a virtual world environment such as Apple Vision Pro, users could generate new artwork and objects to bring into their view, or even turn real world objects into virtual ones that they can interact with in full VR.

For now, the primary use is to create virtual art that can be imported into Blender, Unity, or Unreal Engine for use in virtual scene game development.