ChatGPT Is ‘Really Bad’ at Being a Doctor, Fails Miserably in Medical Test

Can you trust ChatGPT with your health-related queries? A recent study published in the journal Plos One suggests that might not be a good idea.

Researchers tested ChatGPT, specifically its large language model GPT-3.5, by presenting it with 150 medical cases from Medscape, a respected online resource for medical professionals.

These cases had been accurately diagnosed by human doctors and selected post-August 2021 to ensure they weren’t included in ChatGPT’s training data.

The AI platform was given access to detailed patient histories, examination findings, and lab results—paralleling what human physicians use to make diagnoses. ChatGPT then had to choose the correct diagnosis from four multiple-choice answers and explain its reasoning, sometimes providing citations.

The results were less than impressive. ChatGPT made the correct diagnosis only 49 percent of the time and delivered “complete and relevant” answers just 52 percent of the time.

While the chatbot demonstrated a better overall accuracy rate of 74 percent in discarding incorrect choices, it frequently struggled to identify the correct diagnosis.

Study co-author Amrit Kirpalani, an assistant professor at Western University, emphasized the importance of educating the public about the limitations of such AI tools. Kirpalani noted, “They should not replace your doctor yet.”

One major issue the researchers identified was ChatGPT’s difficulty with interpreting numerical values and medical images—key components in accurate medical diagnostics.

Additionally, the AI sometimes hallucinates or ignores critical information.

Despite these shortcomings, there is still potential for AI in healthcare. Kirpalani suggested that AI could be valuable for educating trainee doctors and supporting experienced healthcare providers in decision-making, streamlining administrative tasks, and enhancing patient engagement.

It’s clear that while AI like ChatGPT shows promise in various fields, it is not yet ready to replace the critical decision-making skills of human doctors.

As AI continues to develop, it may become a useful adjunct in medical practice, but for now, the final call in medical diagnostics should remain firmly in the hands of human healthcare providers.