TechingToday

OpenAI's Whisper Model reportedly “hallucinates” in high-risk situations.

General

Researchers have found that OpenAI's speech transcription tool “Whisper” fabricates things that are not said, with potentially dangerous consequences.

According to APNews, the AI model fabricates text (commonly referred to as “hallucinations”). U.S. researchers found that Whisper's errors include racism, violence, and fantasy medicine. [Whisper integrates with several versions of ChatGPT, which is embedded in Microsoft and Oracle's cloud computing platforms. Although Microsoft has stated that the tool is not intended for high-risk uses, healthcare providers are beginning to employ it to transcribe patient consultations with physicians.

Whisper is claimed by its manufacturer to have “near human-level robustness and accuracy” and has been adopted by more than 30,000 U.S. clinicians in 40 health systems. However, researchers have found problems in various studies and have warned against adoption.

Researchers at the University of Michigan studied public meetings and found that Whisper was hallucinating in 8 out of 10 audio transcriptions. Meanwhile, a machine learning engineer found hallucinations in about half of the more than 100 hours of transcriptions, and a third developer found hallucinations in nearly all of the 26,000 transcriptions produced by Whisper.

In the past month, Whisper has been downloaded more than 4.2 million times from the open-source AI platform HuggingFace, making the tool the most popular speech recognition model on the website. Analyzing material from TalkBank, hosted at Carnegie Mellon University, researchers determined that 40% of the hallucinations Whisper was generating were “misinterpreted or misrepresented” by the speaker and therefore potentially harmful.

In one AP example of such a fragment, a speaker described “two other girls and one woman,” and Whisper generated a comment about race, noting that “two other girls and one woman, um, that was black.” In another instance, the tool created a fictitious drug known as “high potency antibiotics.”

Alondra Nelson, a professor at Princeton University, told the Associated Press that mistakes like those found could have “really serious consequences,” especially in the medical field, as “nobody wants a misdiagnosis.”

As former employee William Sanders told the Associated Press, there are calls for OpenAI to address this issue.

While it is expected by many users that AI tools make mistakes and misspell words, researchers have found that other programs make mistakes as well as Whisper.

Google's AI “Overviews” drew criticism earlier this year when it suggested using harmless glue to keep cheese from falling off a pizza, citing sarcastic Reddit comments.

In an interview, Apple CEO Tim Cook acknowledged that AI hallucinations could be a problem in future products, including the Apple Intelligence suite. Cook told the Washington Post that he is not 100% certain about whether the tool could hallucinate.

“I think we've done everything we know how to do, including thinking very deeply about the readiness of the technology in the areas where we're using it,” Cook said.

Despite this, companies are developing more AI tools and programs, and hallucinations such as Whisper's invention continue to be prevalent. OpenAI's response to hallucinations is to recommend the use of Whisper in “decision-making contexts where flaws in accuracy can lead to significant flaws in results.”