OpenAI is finally bringing Advanced Voice mode to the desktop, available in both the Windows and Mac versions of the ChatGPT app, and it works just like the mobile release.
In other words, you will finally be able to have a conversation with your computer, not in the way you would talk to Siri or Alexa (yes, both were triggered when I dictated this draft), but a full conversation as if you were talking to another human being.
Advanced Voice is native speech synthesis. This means that OpenAI's voice bot can understand everything you say, how you speak, and even the pauses between words. It responds in the same natural way, adding vocal tics such as “um” and breathing sounds between each sentence.
We still don't have the full thing promised in OpenAI's spring update of screen sharing and live video in ChatGPT, but it is finally coming and this is yet another major upgrade to the other speech models.
To access Advanced Voice in the desktop app, click the icon in the chat bar, just as you would on iOS or Android. Clicking the button opens a new view with the now infamous gradient blue circle.
You can continue to talk with the AI while you work on other tasks. Also, the AI cannot see what you are doing, but it can respond to your descriptions of tasks and your performance. For example, if you use AI while playing “Minecraft,” if you describe a scene, AI can suggest the types of buildings and blocks to use. [Bringing Advanced Voice to the desktop is the next logical step for OpenAI, further solidifying ChatGPT as a complete productivity platform, not just a gimmick. conversing with AI allows for idea brainstorming and performing tasks that would not be possible alone.
In the future, you will be able to share your screen with Advanced Voice and see what you are doing. And someday, with the rise of AI agents, Advanced Voice may be able to control your screen and help you go through certain processes.
Advanced Voice is an incredibly useful tool, but more powerful is the underlying real-time API. This is the back end of Advanced Voice that developers use to build their own versions or incorporate it into their own tools.
At a recent briefing I had with the OpenAI team, their developer liaison lead, Romain Huet, showed an impressive demonstration of the solar system. He was able to instruct the system to move between planets via voice, provide real-time insights into the nature of each world visited, and answer questions in a conversational manner.
Another demo showed using it as a virtual travel agent to not only book flights, but to help you find the best deals. You communicate clear requirements and they can ask questions and follow up with feedback based on what is available, rather than the logic tree approach currently seen with automated calls.
All of these features will begin to roll out not only in OpenAI's apps, but also in apps from other developers in the coming months and years. Voice will be the new way we all interact with computers.
Now I just need to find better dictation software that doesn't require me to spend hours reviewing everything I have dictated and correcting glaring mistakes.
Comments