Hume EVI is an artificially intelligent text-to-speech voice assistant that may be more natural and intuitive than OpenAI's GPT-4o Advanced Voice, having been updated to the latest version 2
The brainchild of Hume co-founder Alan Cowen and his team, EVI 2 builds on the previous generation model with more natural speech and better emotional understanding
According to Hume, “EVI 2 can quickly converse with users with sub-second response times, understand their tone of voice, and generate any tone of voice
In my tests, I found it to be more natural than OpenAI's Advanced Voice, but slightly slower and with fewer features For example, EVI has a more empathetic tone of voice, while ChatGPT is better at conveying laughter and other sounds associated with the human voice
EVI 2, like ChatGPT Voice and Gemini Live, is an empathic voice assistant available as a dedicated smartphone app, online, or as an API that developers can use in their own projects
Hume's EVI 2 stands out from the pack for its flexibility Native speech synthesis, with its own LLM brain, but also interchangeable with other models such as GPT-4o and Gemini, EVI can be used to give voice to Grok and Meta's Llama 31
I spoke with Dr Cowen prior to the release of EVI 2, and he explained that his goal is to “give developers the tools to build what they want” and that other players in the field are building an ecosystem around them We train on the open source model and give them a voice”
“Developers can take this model and use any framework they want, voice modulation and personality voice,” he added He also said that in the future there could be smaller versions of the model that could run on edges, laptops, or even smart speakers
“Outside of the API and developer tools, the Hume AI app is an impressive experience, allowing you to have conversations, brainstorm ideas, or just get it off your chest with a natural-sounding AI voice that detects your tone of voice and responds accordingly
“We are building a system that can automatically adapt the voice to the user, including adopting the appropriate accent, a more relaxed or formal personality, or whatever works to help you engage with the AI,” Dr Cowen told Tom's Guide
In addition to using the set voices developed by Hume, EVI 2 can also clone voices, but this functionality is limited, allowing users to set voice characteristics related to their identity without directly cloning the actual voice, allowing each user to custom voices
“GPT-4o focuses on shiny features, things that developers actually need, such as the ability to modulate voices without cloning,” Dr Cowen told me in an interview before the new model was announced
Their approach to voice development is prompt-based; the user simply inputs how they want the voice to be pronounced and the AI will represent that world We come up with voice prompts and the AI can follow that personality” It can even generate other languages and accents
I tried out EVI 2 on the Hume AI website with several voices It was impressively natural and was able to adapt its voice according to how I spoke
It also excels as a storyteller, conveying the emotional depth of a character; while it rivals or exceeds the emotional mimicry of the ChatGPT voice, it lacks other features common to the human voice, such as breathing and holding noises Nevertheless, I was so distracted during the conversation that I forgot it wasn't human
For fun, I tried having EVI 2 talk to ChatGPT Advanced Voice as well I tried it with other AI models, but the effect was limited They began chatting like old friends, talking about recipes and hobbies
What makes EVI 2 an important step forward is not its features, but the company's broad approach: while ChatGPT and Gemini Live on Android devices might use Advanced Voice, EVI can be embedded in any software or device
The company's approach is broad
Its ability to track emotional responses through tone of voice may also prove useful in the care sector, such as giving bedside mannerisms to medical robots Or it could be used in place of an automated voice waiting for your call, soothing you when you are angry despite being the 5 millionth person in the queue It has to be better than the lie that “your call is important to us
Comments