People increasingly rely on artificial intelligence (AI) for medical diagnoses because of how quickly and efficiently these tools can spot anomalies and warning signs in medical histories, X-rays and other datasets before they become obvious to the naked eye. But a new study published Dec. 20, 2024 in the BMJ raises concerns that AI technologies like large language models (LLMs) and chatbots, like people, show signs of deteriorated cognitive abilities with age.

“These findings challenge the assumption that artificial intelligence will soon replace human doctors,” the study’s authors wrote in the paper, “as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients’ confidence.”

Scientists tested publicly available LLM-driven chatbots including OpenAI’s ChatGPT, Anthropic’s Sonnet and Alphabet’s Gemini using the Montreal Cognitive Assessment (MoCA) test — a series of tasks neurologists use to test abilities in attention, memory, language, spatial skills and executive mental function.

MoCA is most commonly used to assess or test for the onset of cognitive impairment in conditions like Alzheimer’s disease or dementia. Subjects are given tasks like drawing a specific time on a clock face, starting at 100 and repeatedly subtracting seven, remembering as many words as possible from a spoken list, and so on. In humans, 26 out of 30 is considered a passing score (ie the subject has no cognitive impairment.

Related: ChatGPT is truly awful at diagnosing medical conditions

While some aspects of testing like naming, attention, language and abstraction were seemingly easy for most of the LLMs used, they all performed poorly in visual/spatial skills and executive tasks, with several doing worse than others in areas like delayed recall.

Crucially, while the most recent version of ChatGPT (version 4) scored the highest (26 out of 30), the older Gemini 1.0 LLM scored only 16 — leading to the conclusion older LLMs show signs of cognitive decline.

The study’s authors note that their findings are observational only — critical differences between the ways in which AI and the human mind work means the experiment cannot constitute a direct comparison. But they caution it might point to what they call a “significant area of weakness” that could put the brakes on the deployment of AI in clinical medicine. Specifically, they argued against using AI in tasks requiring visual abstraction and executive function.

It also raises the somewhat amusing notion of human neurologists taking on a whole new market — AIs themselves that present with signs of cognitive impairment.

Share.
2025 © Network Today. All Rights Reserved.