TechingToday

OpenAI's Latest Tool Can Reproduce the Human Voice in Just 15 Seconds - Meet the Voice Engine

General

OpenAI has announced a new tool that recreates a person's voice with just 15 seconds of recorded audio.

Named Voice Engine, the model takes a single 15-second clip and learns the person's voice and the way they speak. From there, the user can input text to make the person say whatever they want in a realistic voice that includes emotion. The company says that while it has developed the Voice Engine in 2022 and used it with preset voices, this is the first time it has discussed using a person's actual voice; OpenAI also acknowledged the potential implications in a blog post on Friday (March 29), which are clearly malicious.

"Because of the potential for abuse of the synthetic voice, we are taking a cautious and informed approach to the broader release," OpenAI wrote in its blog post.

"We are eager to begin a dialogue about the responsible deployment of synthetic speech and how society can adapt to these new capabilities.

OpenAI added that it will decide how, or whether, to release the Voice Engine to the public based on how these conversations proceed.

The company wrote, "We will make more informed decisions about whether and how to deploy this technology on a large scale."

The implications of the Voice Engine are enormous. While it can be used in a variety of noteworthy ways, such as quickly recording presentations and communicating more effectively, it is not difficult to capture the voices of others and use them for malicious purposes. In fact, many of those types of scams already exist and are used to trick people into sending money or sharing information with scammers.

OpenAI argues that such risks are why it is important to get feedback. The company stated that it has engaged with governments, media companies, entertainment companies, and educational institutions in the U.S. and abroad to discuss the Voice Engine. These parties are currently testing the Voice Engine and have agreed not to impersonate others. They must also disclose to anyone listening to the voice that it is generated by the AI; OpenAI has also added a watermark so that listeners can recognize that the voice is not real.

"We believe that the broader deployment of synthetic voice technology should be accompanied by a voice authentication experience that ensures that the original speaker is knowingly adding their own voice to the service and a list of prohibited voices that detects and prevents the creation of voices that sound too much like celebrities," the company said.

What the Voice Engine will look like in the future is unknown. It is possible that it will eventually be made public, but it is also possible that OpenAI will decide that it is not in the public interest. Either way, it is clear that development is possible and that it is here to stay, the company said. Says the company, "It is important that people around the world understand where this technology is headed."