TechingToday

Google Claims New AI Video Generator "Lumiere" Works with Space and Time to Create Great Clips

General

Google has announced a new artificially intelligent video model called Lumiere.

Many existing AI video models struggle with consistency of movement, and even when they are able to capture natural gait, other elements become choppy or blend into the landscape.

Lumiere takes a different approach to video generation. Instead of combining individual frames, it creates the entire video in one process by simultaneously processing the placement of objects and their movements.

While the preview clips look impressive, this is just a research project, so you can't try it yourself. However, the underlying technology and approach to AI video could be integrated into future Google products, making them a major player in this field.

Lumier works extensively with text-to-video and image-to-video, providing stylized generation from reference images to fine-tune exactly how elements in a video look. Some of this has already been achieved in models from Runway and Pika Labs.

The AI model is built on a spatio-temporal architecture, which sounds like something out of a science fiction movie, but in reality means that it considers all aspects of motion and position.

In the generation process, the model considers the "spatial" aspects of the clip and when and how it moves, or the "temporal" component. To create consistent motion, both aspects are done simultaneously in a single run.

The researchers write in their preprint paper on this mode: "Our model learns to directly generate low-resolution video at full frame rate by processing at multiple spatiotemporal scales.

When generative AI video first began to emerge, its main focus was on creating short video clips, but as the technology matured, other features began to emerge Runway offers the ability to highlight different regions of an image and animate them independently .

The Google research team states that Lumier achieves "state-of-the-art text-to-video generation results" and "facilitates a wide range of content creation tasks and video editing applications."

The team also noted that Lumier's "text-to-video generation is a very powerful tool. Not only can smoother motion be expected, the team says, but it can also animate certain areas of an image with relative ease and provide in-painting capabilities, such as changing the style of clothing or type of animal in a frame.

Many of the research projects by companies such as Google, Microsoft, and Meta will not see the light of day in the preview stage. However, the underlying technology has been incorporated into branded products.

This is not even the first AI video tool by Google; there is a video version of the Imagen model that powers Google Cloud's AI image generation, and VideoPoet is a large-scale language model for zero-shot video generation.

Video Poet also generates audio from video clips without requiring text as a guide. According to Google, the Video Poet model can also generate videos of arbitrary length with strong object identity by continuously generating one-second extensions. This is also not currently publicly available.

The answer to the question of whether Lumière can be seen in the real world depends on how acceptable it is to researchers and whether it is worth Google's while to participate in the project. like Imagen, uses Google Cloud may be largely reserved for third-party developers.