Artificial Intelligence #llms#aphasia
Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse? A New Study Evaluates Four Models
A study examined whether instruction-tuned large language models (LLMs) can reliably perform token-level classification of Correct Information Units (CIUs) from aphasic discourse transcripts. Four models—Llama-3.1-8B, Qwen2.5-7B, Mistral-7B, and Phi-3-mini—were tested under zero-shot and few-shot prompting conditions. Results showed that few-shot prompting yielded competitive mean F1 scores between 0.776 and 0.817 for three models, but zero-shot was insufficient and Phi-3-mini was unstable. The authors recommend a human-in-the-loop approach for automated CIU scoring.
Jun 16, 2026 1 source