data science

8 stories

Software #knowledge graph#graph exploration

Initial Exploration Problem Hinders Knowledge Graph Adoption for Enterprise Users

A new paper from McNamara et al. (arXiv, 2026) theorises the Initial Exploration Problem (IEP) in knowledge graph exploration. It identifies three interdependent barriers—scope uncertainty, ontology opacity, and query incapacity—that block lay users from starting exploration. The authors argue current interfaces lack interaction primitives for scope revelation, creating a structural gap in design.

Jun 17, 2026 1 source

Smooth-Basis Models Challenge Tree Ensembles in Tabular Regression Benchmark

Technology

Artificial Intelligence #chebyshev polynomial#anisotropic rbf

Smooth-Basis Models Challenge Tree Ensembles in Tabular Regression Benchmark

A new study from Gerber, Luciano, Lloyd, and Huw benchmarks smooth-basis models (Chebyshev polynomial regressor, anisotropic RBF network, and a hybrid) against tree ensembles and a transformer on 55 tabular regression datasets. The transformer ranks first in accuracy but requires GPUs, while among CPU-viable models, smooth models and tree ensembles are statistically tied, with smooth models showing tighter generalization gaps.

Jun 17, 2026 1 source

Study Reveals Binary Classifiers That Excel Under Extreme Imbalance Without Rebalancing

Technology

Artificial Intelligence #binary classifiers#class imbalance

Study Reveals Binary Classifiers That Excel Under Extreme Imbalance Without Rebalancing

A new study from arXiv systematically evaluates binary classifiers under class imbalance without rebalancing techniques. Results show that advanced models such as TabPFN and boosting-based ensembles maintain high performance even as minority class size shrinks, while traditional classifiers deteriorate. The research offers guidance for model selection in imbalanced learning tasks.

Jun 17, 2026 1 source

Adaptive kNN Graph Model Decouples Inference Latency from Complexity, Achieving Real-Time Classification

Technology

Artificial Intelligence #machine learning#knn

Adaptive kNN Graph Model Decouples Inference Latency from Complexity, Achieving Real-Time Classification

Researchers present an adaptive k-nearest neighbors graph model that decouples inference latency from computational complexity by integrating a Hierarchical Navigable Small World (HNSW) graph with a pre-computed voting mechanism. Benchmarking against eight baselines across six datasets shows real-time performance without compromising classification accuracy.

Jun 16, 2026 1 source

LLMs Struggle on Privacy-Constrained Industrial Tabular Data, Study Finds

Technology

Artificial Intelligence #llms#tabular data

LLMs Struggle on Privacy-Constrained Industrial Tabular Data, Study Finds

A new study from arXiv compares large language models (LLMs) with classical machine learning on an industrial car retrofit prediction task, finding that while LLMs have niche uses, tree ensembles remain superior. The research highlights that on privacy-constrained tables, LLMs are more effective as complementary components than replacements.

Jun 16, 2026 1 source

MMLongEmbed Benchmark Reveals Limitations in Long-Context Multimodal Embedding Models

Technology

Artificial Intelligence #multimodal#embedding

MMLongEmbed Benchmark Reveals Limitations in Long-Context Multimodal Embedding Models

MMLongEmbed is the first comprehensive benchmark for evaluating multimodal embedding models (MEMs) in long-context scenarios. It comprises four retrieval tasks covering text, document, and video modalities. The evaluation reveals that current MEMs rely heavily on superficial feature matching and struggle with deep semantic and structural dependencies, with performance degrading systematically based on context length and key information placement.

Jun 16, 2026 1 source

Research Finds Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate

Technology

Artificial Intelligence #time series#anomaly detection

Research Finds Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate

A study by researchers Pinet, Cumin, Berlemont, and Vaufreydaz on eight public benchmarks for multivariate time series anomaly detection (MTSAD) finds that labeled anomalies are overwhelmingly univariate—no cross-channel rupture occurs without a univariate deviation. The paper's diagnostic framework and synthetic data experiments show that current benchmarks do not justify cross-channel modeling, as channel-dependent detectors offer no measurable gain over channel-independent ones. The authors call for more structurally diverse evaluation sets.

Jun 16, 2026 1 source

New Benchmark IRTS-ToolBench Tests LLMs on Irregular Time Series Question Answering

Technology

Artificial Intelligence #ai#artificial intelligence

New Benchmark IRTS-ToolBench Tests LLMs on Irregular Time Series Question Answering

A research paper introduces IRTS-ToolBench, a benchmark of 1,700 questions spanning 10 task types across 13 domains to evaluate large language models (LLMs) and AI agents on irregular time series question answering (TSQA). The benchmark addresses a gap in existing TSQA benchmarks that assume regular sampling, providing standardized inputs and a reproducible evaluation protocol for verifiable agentic data science.

Jun 16, 2026 2 sources