Cough Regression Benchmark Reveals Trade-Offs in Respiratory Acoustic Foundation Models

A new benchmark from researchers at NC State evaluates five respiratory acoustic foundation models on cough regression tasks—predicting age, BMI, and disease probability from cough audio. The study reveals that smaller MLP heads often outperform linear probes, but full-MLP heads overfit on small clinical data. HeAR and M2D+Resp achieve near-full performance with only 50 samples, while OPERA models require 400. Cross-dataset transfer is asymmetric, with large diverse datasets generalizing better to small clinical populations.

iGEN Editorial

June 16, 2026

Cough Regression Benchmark Reveals Trade-Offs in Respiratory Acoustic Foundation Models

Respiratory acoustic foundation models (FMs) have demonstrated strong performance in cough classification—determining if a cough is indicative of a disease. However, their ability to predict continuous health quantities from cough audio, such as age, BMI, or disease probability, remains largely unexplored. This regression capability has clinical value in settings where physical measurements are unavailable, enabling passive health monitoring. A new preprint by researchers including Sanap, Mayur, Desikan, Prasanna, and Lobaton, Edgar introduces the first multi-model, multi-target cough regression benchmark to evaluate these models.

The Cough Regression Benchmark

The benchmark assesses five foundation models—OPERA-CT, OPERA-CE, OPERA-GT, HeAR, and M2D+Resp—across six targets (age, BMI, disease probability on multiple datasets) under subject-disjoint protocols. The models are tested on three datasets: Coswara, CIDRZ, and CoughVID. Three types of regression heads are compared: linear probing, a small multi-layer perceptron (MLP-small), and a full MLP.

The study, according to the arXiv preprint, finds that MLP-small beats the mean-predictor baseline on all tasks and outperforms linear probing in 23 of 30 model × task cases. However, full MLP overfits on small clinical data but recovers on larger datasets, revealing a dataset-size × head-capacity trade-off.

Model Performance by Target

Model	Best Age MAE (Coswara)	Best Age MAE (CIDRZ)	Key Observation
HeAR	9.12 yr	Excluded due to pretraining overlap	Leads within-dataset age regression on Coswara
OPERA-GT	(favored over OPERA-CT)	Margin within seed variance	Generative pretraining advantage from breath to cough
M2D+Resp	Near-full performance at N=50	Similar	Data-efficient on small samples

According to the authors, HeAR leads within-dataset age regression on Coswara with a mean absolute error (MAE) of 9.12 years. However, HeAR's results on CIDRZ are excluded from headline claims due to possible overlap between HeAR's pretraining data and CIDRZ. OPERA-GT is favored over OPERA-CT on age in all three datasets, with the CIDRZ margin within seed variance, extending a generative-pretraining advantage from breath analysis to cough.

Data Efficiency Across Models

Data efficiency varies significantly. HeAR and M2D+Resp reach near-full performance with only N=50 samples, while OPERA models require N=400 samples to achieve comparable results. This makes HeAR and M2D+Resp particularly attractive for deployment in low-data scenarios, such as emerging outbreak monitoring in under-resourced regions.

Asymmetric Cross-Dataset Transfer

Cross-dataset transfer performance is strongly asymmetric. The study reports that large diverse data generalises to small clinical populations (e.g., CoughVID to CIDRZ yields a negative bias of -0.17 years), but transfer in the opposite direction fails (CIDRZ to Coswara leads to a positive bias of +2.43 years, a 26.6% increase). This highlights the importance of using large, diverse training datasets when building regression models for cough audio.

For enterprise technology decision-makers, these findings have practical implications. When deploying respiratory acoustic AI in clinical or remote monitoring systems, choosing the right foundation model and regression head depends on dataset size and target population. HeAR and M2D+Resp offer data efficiency for small-labelled datasets, while OPERA models may benefit from larger datasets. The asymmetric transfer results underscore the need to match training data to the deployment population.

Sources:

Cough Regression Benchmark Reveals Trade-Offs in Respiratory Acoustic Foundation Models

The Cough Regression Benchmark

Model Performance by Target

Data Efficiency Across Models

Asymmetric Cross-Dataset Transfer

Recommended Stories

UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion

New EEG Benchmark Promises Standardized Evaluation of Foundation Models

Ensemble Deep Learning Achieves 99.27% Accuracy in Lemon Leaf Disease Detection

How Multi-Label Classification and Generative AI Scale User Feedback Analysis