Table of Contents
Advancements in artificial intelligence have transformed many sectors, but they are not without flaws. An experiment conducted by Swedish researchers exposed concerning vulnerabilities in how chatbots validate medical information. By inventing a fictitious disease, “bixonimania,” these researchers demonstrated that AIs could be easily deceived, raising questions about their use in sensitive contexts. Discover how this experiment highlighted the limitations of current artificial intelligence systems.
Key Takeaways
In 2024, Almira Osmanovic Thunström, a researcher at the University of Gothenburg, designed an experiment to test the limits of chatbots. She invented “bixonimania,” a fictitious disease, and integrated it into academic preprints filled with obvious signs of falsehood. Despite these clues, renowned chatbots validated this pathology, considering it real.
Copilot, for example, described bixonimania as “intriguing and relatively rare,” while Gemini recommended consulting an ophthalmologist. This shows that AIs can be fooled by well-formatted content, which they perceive as legitimate.
The error was not limited to chatbots. Researchers from the Institute of Medical Sciences in Mullana, India, cited the fake preprints in a study, proving that even experts can be deceived by AI-generated information. Cureus, the journal where the article was published, retracted the document in March 2026, but the incident revealed a systemic flaw in the verification of academic sources.
Elisabeth Bik, a research integrity specialist, expressed concerns about the automation of academic indexing. She highlighted the risk of erroneous information spreading without human intervention, a problem exacerbated by the use of LLMs (large language models) in research.
Since the experiment, some chatbots have updated their responses. Copilot and Perplexity acknowledged they had been duped and corrected their databases. Gemini, on the other hand, now advises consulting professionals for sensitive medical topics.
In contrast, ChatGPT continues to skirt the issue by providing elaborate answers without admitting the error. This reluctance to acknowledge flaws underscores the need for better information management in AI systems.
This experiment raises important considerations for the future of AI, particularly in the medical field. As chatbots and other AI-based systems become increasingly common tools, it is crucial to improve their ability to discern reliable information from erroneous information. Collaboration between human experts and AI systems could be a promising path to ensure the accuracy and safety of medical data in the future.