AI transcription tool used in hospitals invents things no one has ever said, researchers say

SAN FRANCISCO (AP) — Technology giant OpenAI has claimed that its AI-powered transcription tool Whisper has “near human-level robustness and accuracy.”

But Whisper has one major flaw: It’s prone to rendering snippets of text or even entire sentences, according to interviews with more than a dozen software engineers, developers, and academic researchers. Some of the invented texts, known in the industry as hallucinations, can include racist comments, violent rhetoric and even imaginary medical treatments, these experts said.

Experts said such fabrications are problematic because Whisper is used in numerous industries around the world to translate and transcribe interviews, generate text in popular consumer technologies and create captions for videos.

Even more worrying, they said flock to medical centers Using Whisper-based tools to transcribe patients’ conversations with doctors despite OpenAI’ Warnings that the vehicle should not be used in “high risk areas”.

The full problem is difficult to understand, but researchers and engineers said they frequently encountered Whisper’s hallucinations in their studies. A. University of Michigan For example, a researcher who conducted a study on public meetings said that he found hallucinations in eight out of 10 audio recordings he examined before he started developing the model.

One machine learning engineer said he initially discovered hallucinations in about half of the more than 100 hours of Whisper transcripts he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper.

Problems persist even with well-recorded short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio tracks they examined.

This trend would lead to tens of thousands of incorrect transcriptions across millions of records, the researchers said.

___

This story was produced in partnership with the Pulitzer Center’s AI Accountability Network, which also supported the academic Whisper study in part. AP also receives financial assistance from the Omidyar Network to support the scope of AI and its impact on society.

___

It was stated that such errors could lead to “really serious consequences”, especially in hospital environments. Alondra NelsonUntil last year, he ran the White House Office of Science and Technology Policy for the Biden administration.

“Nobody wants a misdiagnosis,” said Nelson, a professor at the Princeton Institute for Advanced Study in New Jersey. “There has to be a higher bar.”

Whispering is also used to create closed captioning for the Deaf and hard of hearing, a group that is particularly at risk of inaccurate transcription. That’s because Deaf and hard-of-hearing people have no way of detecting fabrications “that are hidden among all these other texts,” he said. Christian VoglerHe is deaf and directs the Gallaudet University Technology Access Program.

OpenAI urged to solve problem

The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call on the federal government to consider AI regulations. At the very least, they said, OpenAI should fix this flaw.

“This seems solvable if the company wants to prioritize this,” said William Saunders, a San Francisco-based research engineer who left OpenAI in February over concerns about the company’s direction. “If you put that out there and people become overconfident about what it can do and integrate it into other systems, it becomes problematic.”

One OpenAI The company is constantly researching how to reduce hallucinations and appreciates the researchers’ findings, the spokesperson said, adding that OpenAI is adding feedback to its model updates.

While most developers assume transcription tools misspell words or make other errors, engineers and researchers said they’ve never seen another AI-powered transcription tool hallucinate as much as Whisper.

Whisper hallucinations

The tool is integrated into some versions of OpenAI’s flagship chatbot, ChatGPT, and is a built-in offering on cloud computing platforms from Oracle and Microsoft that serve thousands of companies worldwide. It is also used to transcribe text and translate it into multiple languages.

Last month alone, a new version of Whisper was downloaded more than 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine learning engineer there, said Whisper is the most popular open-source speech recognition model and is integrated into everything from call centers to voice assistants.

professors Allison Koenecke Cornell University and Mona Sloane Researchers from the University of Virginia examined thousands of snippets obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that approximately 40% of hallucinations were harmful or alarming because the speaker could be misinterpreted or misrepresented.

In one example they uncovered, a speaker said: “That kid, I’m not exactly sure, was going to take the umbrella.”

But the transcription software added: “He took a big piece of a cross, a teeny tiny piece… I’m sure he didn’t have a terrorist knife, so he killed a lot of people.”

The speaker on another recording described it as “two girls and one more lady.” Whisper invented extra comments about race, adding: “two more girls and one lady were, um, black.”

In a third transcript, Whisper invented a non-existent drug called “hyperactive antibiotics.”

Researchers don’t know why Whisper and similar tools hallucinate, but software developers say the hallucinations often occur during pauses, background noise, or while music is playing.

In its online remarks, OpenAI recommended that Whisper not be used in “decision-making contexts where flaws in accuracy could lead to significant flaws in results.”

Writing doctor appointments

That warning hasn’t stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what is said during doctor visits so medical providers can spend less time taking notes or writing reports.

More than 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children’s Hospital Los Angeles, have begun using a Whisper-based tool developed by . NeblaWith offices in France and the USA

Martin Raison, Nabla’s chief technology officer, said the tool is finely tuned to medical language to transcribe and summarize patients’ interactions.

Company officials said they were aware that Whisper could hallucinate and were trying to find a solution to the problem.

Raison said it was impossible to compare Nabla’s AI-generated transcription to the original recording because Nabla’s tool deleted the original audio “for data security reasons.”

The tool has been used to transcribe an estimated 7 million medical visits, Nabla said.

Former OpenAI engineer Saunders said deleting original audio could be concerning if transcripts aren’t double-checked or if clinicians can’t access the recording to verify its accuracy.

“If you take away the ground truth, you can’t catch errors,” he said.

Nabla said no model is perfect, and theirs currently requires medical providers to quickly organize and approve written notes, but that may change.

privacy concerns

Because patients’ conversations with their doctors are confidential, it is difficult to know how AI-generated transcripts affect them.

California state lawmaker Rebecca Bauer-KahanHe said he took one of his children to the doctor earlier this year and refused to sign the health network form on the condition that he get permission to share his consulting audio with vendors including Microsoft Azure, the cloud computing system operated by OpenAI’s largest investor. . Bauer-Kahan said he did not want such intimate medical conversations to be shared with technology companies.

“The release was very specific that for-profit companies would have the right to have this,” said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the State Assembly. “I said, ‘Absolutely not.’ ”

John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.

___

Schellmann reported from New York.

___

AP is solely responsible for all content. Find APs standards for working with philanthropists, a list of supporters and areas of funded coverage AP.org.

___

Associated Press and OpenAI license and technology agreement Allowing OpenAI access to some of AP’s text archives.

___

___

OpenAI urged to solve problem

Whisper hallucinations

Writing doctor appointments

privacy concerns

___

___

___

Related Posts