TechingToday

We tested Hailuo's new MiniMax image-to-video model

General

Hailuo MiniMax was launched earlier this year and quickly became one of the best text-to-video artificial intelligence models on the market, offering realistic motion and high-quality video rendering

While I found its quality to be good, the lack of an image-to-video model was a limiting factor in its usefulness It also had slow response times, and while the motion was consistently good, its realism did not live up to the hype

The company is rapidly developing this model with a new dedicated English-language website and community The latest upgrade is to finally launch an image-to-video model that gives users more control over how their videos look

I tested it with a series of prompts

To get the most out of the image-to-video model, we need to start with good images

We came up with five fun prompts that required various degrees of movement, and with the help of ChatGPT we refined them to be as descriptive as possible

We then passed the images to MiniMax along with the custom motion prompts or the prompts as a whole

This prompt tests the ability to handle complex motion in the unusual physical environment of the low gravity of a dust storm Mars

Image prompt: “Capture a lone astronaut walking on Mars in a dust storm in a dramatic cinematic style The composition places the astronaut in the center of the frame, silhouetted against a swirling red cloud of dust The lighting is dim and diffused, with sunlight barely penetrating the storm The color palette is dominated by warm, rusty tones of red and orange, creating an atmosphere that is both hostile and awe-inspiring The mood is adventurous and ominous at the same time, evoking a sense of isolation in an exotic landscape The shot is taken from a low angle, emphasizing the smallness of the astronauts against the vast Martian terrain”

There are subtle details in the background, such as wind-crushed rocks

Motion prompt: “Astronaut running through a sandstorm on Mars

A common test prompt I try with Runway and Kling is someone speaking Here, I asked the AI to generate an image of a woman speaking and then move it

Image prompt: “A young woman in lively conversation, rendered in a lively street photography style The composition captures her at a three-quarter angle and uses a shallow depth of field to focus on her expression while blurring the busy street behind her The golden hour natural light casts a warm glow on her face, accentuating her happy expression The color palette is a mix of warm yellow and soft blue, conveying energy and life The mood is lively and spontaneous, with a candid storytelling quality, and the use of a 50mm lens ensures a natural perspective, drawing the viewer into her conversation, while small details such as the pedestrians in the background and the soft blur of light add to the urban atmosphere”

[24

Motion Prompt “Conversation

One of the first “good” AI images I saw was of a dog prancing on a beach, and one of Sora's best demo videos was of a dog playing So I had Flux create an image of the dog in motion and used Hailuo to make it really move

Image prompt: “I captured a happy dog playing on the beach in a whimsical, painterly style The composition places the dog in the middle of the action, jumping up to catch a thrown ball, the spray of seawater frozen in mid-air Lighting is bright and golden, indicating late afternoon with the sun low on the horizon casting long shadows The color palette is warm sandy browns, the azure of the ocean, and lots of golden highlights, enhancing the playful atmosphere The carefree, energetic mood evokes happiness and freedom The scene was shot from a slightly lower vantage point to highlight the dog's enthusiasm, with technical details centered on motion blur to convey a sense of movement, and the gentle waves in the background add to the coastal setting”

Motion prompt: “Smartphone camera, dog bouncing on beach

Drone displays can be magical, but limited in scope due to cost and the complexity of herd motion, but AI video can do better Also, it is up to the image and model, as it is not given text prompts

Image prompt: “An amazing drone light display over London, rendered in a futuristic neon-inspired style Illuminated drones form intricate patterns over iconic landmarks such as Tower Bridge and The Shard The lighting is entirely artificial, with bright multi-colored lights from the drones against the night sky contrasting with the warm city lights below The color palette includes vibrant blues, purples, and greens, creating a futuristic and fantastical atmosphere The mood is one of wonder and surprise, stirring the viewer's imagination The images are shot from a high vantage point, looking down slightly on the cityscape, with technical details such as long exposures to create light trails and shimmering reflections on the Thames”

Every AI video model I have tried has struggled with the movement of the vehicle So let's see how well it handles a not particularly good image of a sports car racing by

Image prompt: “Draw a sleek racing car speeding down a winding mountain road in ultra-realistic style Motion blur in the background emphasizes the speed Lighting is natural, with sunlight filtering through the trees casting shadows on the road The color palette contrasts the bright red of the car with the lush green of the surrounding forest and the subdued gray of the asphalt The mood is intense and exhilarating, evoking the thrill of a high-speed race Dynamic side angles, shot almost horizontally with the car, express movement and agility Technical details such as sharp focus on the car and controlled depth of field highlight the precision and power of this scene

Motion Prompt: “Fixed camera, car rushing off into the distance

The Hailuo MiniMax was already impressive While waiting for these to be completed, I looked back at some of the text-to-video generations I have created in the past, as well as examples from others, and this is very top notch The image-to-video conversion takes it up a notch

One thing that really stood out to me was how well it handled consistent movement throughout the six seconds of video generated per prompt I was amazed at how well the model handled the hand movements in the “woman speaking” test

Not entirely perfect The ball disappears, the dog seems to change breeds along the way, and the astronaut does a jig at the beginning But it's better than most AI video models I've tried