Smooth-Basis Models Challenge Tree Ensembles in Tabular Regression Benchmark

A new study from Gerber, Luciano, Lloyd, and Huw benchmarks smooth-basis models (Chebyshev polynomial regressor, anisotropic RBF network, and a hybrid) against tree ensembles and a transformer on 55 tabular regression datasets. The transformer ranks first in accuracy but requires GPUs, while among CPU-viable models, smooth models and tree ensembles are statistically tied, with smooth models showing tighter generalization gaps.

iGEN Editorial

June 17, 2026

Smooth-Basis Models Challenge Tree Ensembles in Tabular Regression Benchmark

Tree ensembles have long dominated tabular regression, but a new study revisits smooth-basis models — Chebyshev polynomial regressors and radial basis function (RBF) networks — and finds they can compete on accuracy while offering better generalization properties. The research, conducted by Gerber, Luciano, Lloyd, and Huw and released as a preprint on arXiv, benchmarks these models across 55 regression datasets organized by application domain.

The researchers developed three smooth-basis models: an anisotropic RBF network with data-driven centre placement and gradient-based width optimization, a ridge-regularized Chebyshev polynomial regressor, and a hybrid Chebyshev model tree. All three models are released as scikit-learn-compatible packages. These were benchmarked against tree ensembles, a pre-trained transformer, and standard baselines, with evaluation covering accuracy and generalization behaviour.

Key Findings

The transformer ranked first on accuracy across a majority of datasets, according to the study. However, its GPU dependence, inference latency, and dataset-size limits constrain deployment in CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles were statistically tied on accuracy, but the former tended to exhibit tighter generalization gaps.

Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs.

The paper recommends routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalization and gradually varying predictions.

Model Comparison

Model Type	Accuracy Rank	GPU Required	Generalization Gap	Deployment Suitability
Transformer	1st (majority datasets)	Yes	Not reported	Limited (GPU-dependent)
Tree ensembles	Tied (CPU models)	No	Wider	CPU-friendly
Smooth-basis models	Tied (CPU models)	No	Tighter	CPU-friendly

Implications for Practitioners

The results suggest that data scientists should consider smooth-basis models as viable alternatives to tree ensembles, especially in settings where prediction smoothness and generalization are critical. The availability of scikit-learn-compatible packages lowers the barrier to adoption. The study's findings are particularly relevant for industries that rely on CPU-based inference, such as many applied science and industrial applications.

The research team did not disclose specific funding sources or affiliations beyond the arXiv submission. The paper is available under the identifier arXiv:2602.22422.

Sources:

Smooth-Basis Models Challenge Tree Ensembles in Tabular Regression Benchmark

Key Findings

Model Comparison

Implications for Practitioners

Recommended Stories

LLMs Struggle on Privacy-Constrained Industrial Tabular Data, Study Finds

New Research Reveals Distinct Training Dynamics of On-Policy Distillation for Large Language Models

UniSinger: First End-to-End Framework Unifies Song Generation and Singing Voice Conversion

Epileptic Seizure Detection via Frequency-Aware Graph Convolutional Networks Achieves 99% Accuracy