OpenAI CEO Sam Altman said that the first users will have access to GPT-4o Advanced Voice in a few weeks, but that this will be a limited “alpha” rollout
The company is testing all the features of GPT-4o, a new type of Omni model released in the May Spring Update; unlike GPT-4, this native multimodal model can understand speech directly without converting it to text
This makes GPT-4o faster and more accurate when acting as a voice assistant, and it can even pick up tones and vocal intonation during conversations
Users are patiently waiting for access, but OpenAI says that safety testing must first be completed While some users have temporary access and there have been several demonstrations of its capabilities, most users will not be able to access it until later this year
GPT-4o Advanced Voice is a completely new type of voice assistant, similar to the recently announced French model Moshi, but larger
In a demo of this model, we saw GPT-4o Advanced Voice create custom character voices, generate sound effects while telling stories, and even act as a live translator
This native speech capability is an important step toward creating a more natural AI assistant In the future, live vision capabilities will also be included, allowing the AI to see what you see
Another use case for Advanced Voice is to act as a very patient language teacher, directly correcting pronunciation and helping to improve accent
“ChatGPT's Advanced Voice mode can understand and respond to emotional and nonverbal cues, bringing it closer to a real-time natural conversation with an AI Our mission is to thoughtfully bring these new experiences to you,” OpenAI said in a statement last month
OpenAI is one of the most cautious artificial intelligence labs, spending significant time on security testing, validation, and putting guardrails in place for major new models
Altman also calls for regulation of frontier-style models like the forthcoming GPT-5 and world models like Sora as they pose risks to society With this cautious approach, other companies are beginning to catch up with OpenAI, and GPT-4 is no longer the only top-notch model
The company was concerned that GPT-4o Advanced Voice, without proper guardrails, could provide potentially damaging information or be used unexpectedly To address this, the company is gradually releasing it first to trusted users and then more widely over time
“As part of an iterative rollout strategy, we will start with a small group of users in alpha, gather feedback, and expand based on what we learn,” the spokesperson explained
“We plan to make it accessible to all Plus users in the fall The exact timeline depends on meeting our high standards of safety and reliability We are also working on rolling out new video and screen sharing features that we have demoed separately and will keep you posted on the timeline”
Comments