Artificial Intelligence #ai#clinical ai
Study: LLM Accuracy Declines Predictably as Reasoning Steps Increase in Clinical AI Tasks
A study on arXiv introduces a hop-count taxonomy to predict LLM failure on clinical question answering. Tests across Claude and GPT models show monotone accuracy decline with reasoning depth, with extended thinking failing to flatten the curve.
Jun 16, 2026 1 source