Tree ensembles have long dominated tabular regression, but a new study revisits smooth-basis models — Chebyshev polynomial regressors and radial basis function (RBF) networks — and finds they can compete on accuracy while offering better generalization properties. The research, conducted by Gerber, Luciano, Lloyd, and Huw and released as a preprint on arXiv, benchmarks these models across 55 regression datasets organized by application domain.
The researchers developed three smooth-basis models: an anisotropic RBF network with data-driven centre placement and gradient-based width optimization, a ridge-regularized Chebyshev polynomial regressor, and a hybrid Chebyshev model tree. All three models are released as scikit-learn-compatible packages. These were benchmarked against tree ensembles, a pre-trained transformer, and standard baselines, with evaluation covering accuracy and generalization behaviour.
Key Findings
The transformer ranked first on accuracy across a majority of datasets, according to the study. However, its GPU dependence, inference latency, and dataset-size limits constrain deployment in CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles were statistically tied on accuracy, but the former tended to exhibit tighter generalization gaps.
Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs.
The paper recommends routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalization and gradually varying predictions.
Model Comparison
| Model Type | Accuracy Rank | GPU Required | Generalization Gap | Deployment Suitability |
|---|---|---|---|---|
| Transformer | 1st (majority datasets) | Yes | Not reported | Limited (GPU-dependent) |
| Tree ensembles | Tied (CPU models) | No | Wider | CPU-friendly |
| Smooth-basis models | Tied (CPU models) | No | Tighter | CPU-friendly |
Implications for Practitioners
The results suggest that data scientists should consider smooth-basis models as viable alternatives to tree ensembles, especially in settings where prediction smoothness and generalization are critical. The availability of scikit-learn-compatible packages lowers the barrier to adoption. The study's findings are particularly relevant for industries that rely on CPU-based inference, such as many applied science and industrial applications.
The research team did not disclose specific funding sources or affiliations beyond the arXiv submission. The paper is available under the identifier arXiv:2602.22422.