iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention Tamil Nadu OE Spinning Mills Threaten 50% Production Cut Over High Cotton Waste Prices BridgePolicy: New Diffusion Bridge Method Improves Visuomotor Policy Learning in Robotics New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based Course AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents Calibrated Variance Propagation Cuts Uncertainty Estimation Cost for Deep Learning Models Patel Engineering Joint Venture Secures ₹126 Crore Tasgaon Lift Irrigation Project in Maharashtra P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention Tamil Nadu OE Spinning Mills Threaten 50% Production Cut Over High Cotton Waste Prices BridgePolicy: New Diffusion Bridge Method Improves Visuomotor Policy Learning in Robotics New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based Course AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents Calibrated Variance Propagation Cuts Uncertainty Estimation Cost for Deep Learning Models Patel Engineering Joint Venture Secures ₹126 Crore Tasgaon Lift Irrigation Project in Maharashtra
Home ›› Technology ›› Ai ›› Llms ›› LLMs Struggle on Privacy-Constrained Industrial Tabular Data, Study Finds

LLMs Struggle on Privacy-Constrained Industrial Tabular Data, Study Finds

A new study from arXiv compares large language models (LLMs) with classical machine learning on an industrial car retrofit prediction task, finding that while LLMs have niche uses, tree ensembles remain superior. The research highlights that on privacy-constrained tables, LLMs are more effective as complementary components than replacements.

iG
iGEN Editorial
June 16, 2026
LLMs Struggle on Privacy-Constrained Industrial Tabular Data, Study Finds

Industrial retrofit planning relies on structured operational data, not free text. Planners must estimate whether a newly registered prototype will require a retrofit, which package it will need, and how long the work will take. A study on arXiv, titled "LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction," examines this challenge using a real-world industrial dataset. The researchers compared strong tabular machine learning baselines with three LLM-based strategies on row-serialized inputs.

The dataset links a prototype-registration system (284,271 vehicles) with a retrofit-management system (48,716 cleaned visits). The tasks include binary occurrence prediction, 15-way retrofit-type classification, per-visit duration regression, and an aggregated monthly benchmark.

LLM Strategies Compared

The study evaluated three LLM approaches:

  • Embedding features using Amazon Titan
  • Direct prompted classification using Claude Sonnet 4
  • ML+LLM stacking (a hybrid approach)

These were pitted against classical tree ensembles and other tabular methods.

Key Findings

The results show a clear pattern: classical tree ensembles remain the strongest standalone models. However, the LLM results reveal consistent behavior across tasks.

Strategy Binary AUC Multiclass Weighted F1 Notes
Embedding features (Amazon Titan) 0.982 Remains useful on tables
Direct prompted classification (Claude Sonnet 4) 0.500 0.018 Collapsed when semantic signal removed by hashing
Hybrid stacking (ML+LLM) 0.626 Best manually built multiclass model
Lag-based ML (monthly benchmark) Outperformed time-series foundation models

On the monthly benchmark, lag-based machine learning outperformed time-series foundation models, though Chronos-small remained competitive in zero-shot forecasting.

The study notes that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines. According to the paper's abstract, "the results suggest that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines."

Implications for Industrial AI

For enterprise technology buyers, the insights are practical. When dealing with sensitive operational data—where semantics may be limited or hashed for privacy—LLMs used directly for classification can fail dramatically (weighted F1 of 0.018). However, embeddings can preserve useful structure (AUC 0.982), and hybrid stacking can improve multiclass predictions. The study demonstrates that for industrial tabular datasets, classical machine learning, especially tree-based ensembles, still provides the most reliable results. LLMs are best deployed as feature extractors or in ensemble with traditional models, not as standalone replacements. This aligns with the growing consensus that for structured data without rich semantic context, traditional methods remain the default choice.


Sources:

Keep Reading

Recommended Stories

MatchLM2Lite: Scalable MLLM-Lite Framework Cuts Reproduced Video Views by 2.5% Technology

MatchLM2Lite: Scalable MLLM-Lite Framework Cuts Reproduced Video Views by 2.5%

The paper presents MatchLM2Lite, a production-grade reproduced content identification system that distills a multimodal large language model into a compact student model. Deployed at scale, it reduced reproduced video views by 2.5% without hurting engagement, with 35x lower computational cost and latency under 30 seconds.

June 16, 2026
A Theoretical Roadmap to Fuse Foundation Models and Knowledge Graphs Technology

A Theoretical Roadmap to Fuse Foundation Models and Knowledge Graphs

A new theoretical paper formalizes the 'Impedance Mismatch' between Foundation Models and Knowledge Graphs, arguing that current approaches like RAG are superficial. The authors propose a roadmap including Structured Residual Streams, Vector Symbolic Architectures, and Orthogonal Subspace Editing for true semantic fusion.

June 16, 2026
New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control Technology

New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control

A new paper from researchers shows that truthfulness-related attention heads are preserved across generations of large language models, even after instruction tuning or multimodal adaptation. The authors propose TruthProbe, a soft-gating strategy that amplifies these heads to reduce hallucinations, with improvements on HaluEval, POPE, and CHAIR benchmarks.

June 16, 2026
New LLM Framework Detects Phishing Emails with Over 90% Accuracy Technology

New LLM Framework Detects Phishing Emails with Over 90% Accuracy

A paper on arXiv introduces LLMPEA, a framework using GPT-4o, Claude Sonnet 4, and Grok-3 to detect phishing emails with over 90% accuracy. The study also reveals vulnerabilities to adversarial attacks, prompt injection, and multilingual attacks, emphasizing the need for hardening before deployment.

June 16, 2026