Artificial Intelligence #ehr#clinical
EHRNote-ChatQA: New Benchmark Tests LLMs on Multi-Turn Clinical Question Answering
Researchers introduce EHRNote-ChatQA, the first benchmark for evidence-grounded multi-turn clinical question answering over multiple discharge summaries. Built from MIMIC-IV data, it contains 967 patient-level samples and 16,072 QA pairs, revealing that LLMs struggle more with evidence grounding than content answering and that multi-turn errors compound.
Jun 16, 2026 1 source