Google DeepMind creates an AI model that can add sound to silent video

Google DeepMind creates an AI model that can add sound to silent video

Having animated memes over the last few days, AI has been drawing attention to silent video. Specifically, it takes audio into an AI-generated clip.

Google's DeepMind research arm has built a powerful new AI model that can add audio to video without voice and dubbing it from above with sound effects and music.

The most striking thing about the new study is that it can accurately track the visuals. In one clip, they show a close-up of the guitar playing and the SFX music closely matches the actual notes being played.

In some ways, last month through ElevenLabs saw the generation of music based on visual prompts, and a revival of older media that no longer has audio components

The Google DeepMind model is not yet available, but there are similar tools in elevenlabs and can be tried today. If you want to create a video to try it out, you can check out our list of 5 best AI video generators.

In the X post thread, Google's DeepMind account starts things off with a character walking through an eerily illuminated tunnel. 

From above the dramatic percussion, some light choir music can be heard, just as the characters' footsteps can be heard as they move through the scene.

Second, the audio produced by "Wolf Barking on the Moon" as a prompt is well connected with the animation and offers a chorus of distant howls.

The example of the harmonica sounds a little too "mysterious valley" in the way its pitch shifts, but the backing below is solid, but especially of jellyfish, it has some extra prompts including "marine life" and "ocean."

The video with the prompt "Drummer on stage at a concert surrounded by flashing lights and cheering crowds" is a bit off. One

is still an impressive start to a project that is likely to grow with time, although the stick seems to be focused on snares and maybe floor Tom, while the audio sounds a bit more complex with some other drums involved.

Like many of Google's projects, this has not yet been released and is just a research preview. Google says there are limitations and safety issues that should be addressed first.  

For example, "Video artifacts or distortions that are outside the model's training distribution, because the quality of the audio output depends on the quality of the video input," Audio

They are also working on lip-syncing video with audio, but are currently trying to do this, but they are not. Not always accurate and make the mysterious Valley effect

Not to lose, ElevenLabs this week introduced its new te to the Sound effects API that allows you to generate audio effects based on what you upload to it

Unlike Google's V2A model, the ElevenLabs API is already accessible. It is accessible and works surprisingly well from the experiment. 

In the example above, you get a few different options to choose from bottle smashing videos, but Dicaprio's laughter memes get additional audio from others in the room.

The company "bootstrap" a quick app to demonstrate what is possible with the API, allowing it to upload videos and add sounds. It's free to use, open source, and you can try it now.

ElevenLabs told Tom's Guide that the real purpose is to build things with APIs, such as for other companies and developers to integrate into GENERATIVE video. 

Categories