AI Chatbots Got Health Questions Wrong Nearly Half the Time, Study Shows
People use AI for nearly everything, from grocery lists to mock therapy sessions. While there are obvious limitations, it appears we may give these chatbots too much of the benefit of the doubt, particularly on subjects we may not be versed in. This is particularly problematic when it comes to health.
Researchers tested five popular AI chatbots (Gemini, DeepSeek, Meta AI, ChatGPT, and Grok) by asking each one 50 health questions. The questions were on subjects like cancer, vaccines, stem cells, nutrition, and athletic performance.
Medical experts then rated each response as non-problematic, somewhat problematic, or highly problematic, and the researchers separately checked the quality of citations and how easy the answers were to read.
Nearly half of the answers (49.6%) were deemed problematic, 30% somewhat problematic, and 20% highly problematic. Topics mattered more than what particular AI was used. All bots did reasonably well on vaccines and cancer (areas with strong, well-organized scientific evidence) but did poorly on nutrition, athletic performance, and stem cells (areas with messier or more contested information).
Citations were a major weak spot. Chatbots were asked for 10 scientific references to back up their answers. On average, only about 40% of the citation details were accurate and complete, and no chatbot ever produced a fully correct reference list. This likely hits home for anyone who has used AI for research purposes, as it is notoriously bad with this sort of thing.
In fairness, many questions were designed to "bait" the chatbots toward giving misleading or risky answers (e.g. what are some alternative therapies better than chemotherapy). In addition, the rating system was deliberately strict in an attempt to prioritize user safety. This study was also performed in February of 2025, which means the AI software may have improved their responses since then.
None of this means AI chatbots are useless for health questions. But it does suggest a healthy dose of skepticism is warranted, especially on topics where you can't easily fact-check the response yourself. The safest approach is to treat chatbot answers the way you'd treat a tip from a knowledgeable stranger: a reasonable starting point, never a final word. Verify citations before trusting them, be extra wary on murkier topics like supplements and diet, and remember that a chatbot's smooth, certain tone is not a substitute for an actual medical professional's judgment.
Copyright 2026 The Arena Group, Inc. All Rights Reserved
This story was originally published June 22, 2026 at 2:28 PM.