The generative AI wars are reaching a climax as more and more companies release their own models. Generative video seems to be the biggest battleground at the moment, but Genmo is taking a different approach.
The company is releasing its Mochi-1 model as a “research preview,” but this new video generation model is open source under the Apache 2.0 license and can be taken apart and reassembled.
In other words, Mochi-1 is free to use and can be tried on Genmo's website. The advantage of being open source means that it will be available on regular generative AI platforms in the future, and may one day run on a good gaming PC.
Genmo enters a very competitive market where different services offer different features, such as templates from Haiper, realism from Kling and Hailuo, and fun effects from Pika Labs and Dream Machine Genmo's focus is, to bring the cutting edge to open source.
So why use Genmo's model over any other currently offered? It comes down to movement, and when we spoke with Paras Jain, CEO of Genmo, he explained that movement is a key metric when benchmarking models.
“For a very long time, I think the only videos that are basically uninteresting are videos that don't move. And I felt like a lot of AI videos suffered from this 'Live Photo effect,'” he explained. Our historical model has this, and I think that's how we evolved the technology. But the videos about movement were, above all, the most important thing we invested in”
.
This first release is an amazingly small 10 billion parameter transformation diffusion model that uses a new asynchronous approach to pack a lot of punch in a small package.
According to Jain, Mochi-1 was trained with video only, rather than the traditional mixed video, image, and text approach. This allowed Mochi-1 to gain a deeper understanding of physics.
The team then worked on making sure the model correctly understood what people wanted it to make. He said: “We've invested really, really heavily in rapid adherence as well as following what you say.”
Genmo hopes that Mochi-1 can provide “best-in-class” open source video generation, but as of now, video is limited to 480p as part of a preview of a new study launched today.
As Jain mentions, there is also a big focus on rapid compliance and recognition; Genmo is benchmarking this with Open AI's DALL-E 3 followed by the Vision Language Model as a judge
Would you test Mochi-1? Please let us know; Mochi-1 is certainly entering a crowded situation, but due to its open source nature, it has the potential to spread further than some of its rivals.
Mochi-1 is not the only open-source AI video model announced this week; AI firm Rhymes announced Allegro as a “small, efficient, open-source text-to-video model.”
Instead of Mochi-1's 24 frames per second, 420p but 15 frames/second, 720p, which is also available under the Apache license.Neither model runs on your laptop yet, but as Jain said, the beauty of open source is that someday someone will tweak it to run on lower-end hardware and we'll be making video offline.
Comments