Artificial intelligence tools are more likely to provide incorrect medical advice when the misinformation comes from what the software considers to be an authoritative source, a new study has found. Researchers reported in The Lancet Digital Health that AI systems were more easily misled by realistic hospital documents than by informal online discussions.
In tests involving 20 open-source and proprietary large language models, the researchers found the software was more often fooled by errors embedded in doctors’ discharge notes than by incorrect claims drawn from social media conversations. The findings highlight growing concerns over the reliability of AI tools as their use expands across healthcare settings.
“Current AI systems can treat confident medical language as true by default, even when it’s clearly wrong,” said Dr Eyal Klang of the Icahn School of Medicine at Mount Sinai in New York, who co-led the study. “For these models, what matters is less whether a claim is correct than how it is written.”
AI accuracy is becoming a pressing issue in medicine, where a growing number of mobile applications claim to assist patients with medical questions, while clinicians increasingly rely on AI-enhanced systems for tasks ranging from transcription to surgical planning. Although many patient-facing tools are not designed to provide diagnoses, researchers warn that flawed outputs could still influence health decisions.
Klang and his colleagues tested the AI systems using three types of content: authentic hospital discharge summaries containing a single fabricated recommendation; widely shared health myths taken from Reddit; and 300 short clinical scenarios written by physicians. The models were then exposed to more than one million prompts in which users asked questions or sought guidance based on the material.
Across all sources, the AI tools accepted and repeated false information about 32% of the time. When the misinformation appeared in what looked like an official hospital note, that figure rose sharply to nearly 47%, according to Dr Girish Nadkarni, chief AI officer of the Mount Sinai Health System and a co-leader of the study.
By contrast, AI systems were far more sceptical of social media content. When incorrect claims originated from Reddit posts, the likelihood of the models passing on the misinformation dropped to just 9%, Nadkarni said.
The study also found that prompt wording influenced outcomes. AI systems were more inclined to endorse incorrect information when questions were framed in an authoritative tone, such as when a user claimed to be a senior clinician endorsing a recommendation.
Among the models tested, OpenAI’s GPT systems were the least susceptible to false claims, while others accepted up to 63.6% of incorrect statements. Nadkarni said AI still holds promise for improving care but warned safeguards are essential before wider deployment. Separate research published in Nature Medicine recently found that asking AI about symptoms was no more helpful for patients than a standard internet search.
