Class imbalance remains a persistent challenge in machine learning, especially in critical fields such as medical diagnostics and anomaly detection where the minority class represents rare but important events. A new study posted on arXiv titled "Beyond Rebalancing: Benchmarking Binary Classifiers Under Class Imbalance Without Rebalancing Techniques" investigates how standard binary classifiers perform when no explicit rebalancing—such as undersampling or oversampling—is applied.
Benchmarking Methodology
The authors, including Nawaz, Ali, Ahmad, Amir, and Khan, evaluated a diverse set of binary classifiers across both real-world and synthetic datasets. They progressively reduced the minority class size, using one-shot and few-shot scenarios as baselines to simulate extreme imbalance. Additionally, they varied data complexity by generating synthetic decision boundaries to mimic real-world conditions. For comparison, they also ran experiments with undersampling, oversampling strategies, and one-class classification (OCC) methods.
Key Findings: Advanced Models Prevail
The study confirms that classification difficulty increases as data complexity rises and the minority class size decreases. Traditional classifiers saw significant performance drops under severe imbalance. However, advanced models—specifically TabPFN and boosting-based ensembles—retained relatively higher performance and generalization ability, according to the preprint. These models were less dependent on explicit rebalancing techniques to handle skewed class distributions.
| Classifier Category | Performance Under Extreme Imbalance |
|---|---|
| Traditional classifiers (e.g., logistic regression, SVM) | Deteriorates significantly |
| TabPFN | Retains relatively higher performance |
| Boosting-based ensembles (e.g., XGBoost, AdaBoost) | Retains higher generalization |
| One-class classification methods | Examined but not highlighted as top performer |
Visual Interpretability and Metrics
The authors also used visual interpretability and standard evaluation metrics to validate their findings. While the paper does not specify exact metric numbers, the approach provides a systematic comparison of classifier robustness under imbalanced conditions without rebalancing.
Guidance for Practitioners
This work offers practical guidance for model selection in imbalanced learning. For enterprise teams dealing with rare event detection—such as fraud, equipment failure, or disease diagnosis—the results suggest that choosing a robust classifier upfront can reduce the need for complex rebalancing pipelines. The study emphasizes that understanding a classifier's inherent resilience to imbalance is critical before applying data-level techniques.
The research is accessible on arXiv under a Creative Commons Attribution 4.0 International license, providing a benchmark for future work on imbalanced classification.