iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
MSC denies report of Hapag-Lloyd acquisition talks; carrier says claim 'not true or correct' Tin Prices Poised to Rule Elevated in 2026 on Semiconductor Demand and Supply Disruptions India must boost oilseed yields to cut edible oil imports, SEA chief says India Air Freights 5 Tonnes of Medical Aid to Afghanistan Under Humanitarian Assistance Tsakos Joins Greek Capesize Ordering Wave at Hengli Heavy Industries How US quietly kept Gulf crude moving despite Iran's Hormuz blockade Rupee Rebounds 31 Paise to 94.29 as Easing Oil, Dollar Index Boost Sentiment Shipping Braces for Monster El Niño as NOAA Warns of Record-Intensity Event Threatening Global Trade Lanes India May Require Refiners to Triple Crude Oil Inventories After Lessons From China Fleets Reposition for Hormuz Reopening Ahead of US-Iran Peace Deal Signing MSC denies report of Hapag-Lloyd acquisition talks; carrier says claim 'not true or correct' Tin Prices Poised to Rule Elevated in 2026 on Semiconductor Demand and Supply Disruptions India must boost oilseed yields to cut edible oil imports, SEA chief says India Air Freights 5 Tonnes of Medical Aid to Afghanistan Under Humanitarian Assistance Tsakos Joins Greek Capesize Ordering Wave at Hengli Heavy Industries How US quietly kept Gulf crude moving despite Iran's Hormuz blockade Rupee Rebounds 31 Paise to 94.29 as Easing Oil, Dollar Index Boost Sentiment Shipping Braces for Monster El Niño as NOAA Warns of Record-Intensity Event Threatening Global Trade Lanes India May Require Refiners to Triple Crude Oil Inventories After Lessons From China Fleets Reposition for Hormuz Reopening Ahead of US-Iran Peace Deal Signing
Home ›› Topics ›› llm

Topic

llm

60 stories
M*: A Modular, Extensible Serving System for Efficient Multimodal AI Inference Technology
Artificial Intelligence #multimodal#ai serving

M*: A Modular, Extensible Serving System for Efficient Multimodal AI Inference

Researchers have developed M*, a universal serving system for composite AI models that integrates diverse components like vision encoders and language backbones. Using a novel 'Walk Graph' abstraction, M* achieves significant performance improvements: 20% lower latency for text-to-image, up to 2.7x higher throughput for text-to-speech, and 12.5x faster robotic planning rollouts compared to existing baselines.

Jun 16, 2026 1 source
Fast LLM-Based Semantic Filtering: Unified Framework and Adaptive Two-Phase Method Deliver 1.6–2.0x Speed Gains Technology
Artificial Intelligence #llm#semantic filtering

Fast LLM-Based Semantic Filtering: Unified Framework and Adaptive Two-Phase Method Deliver 1.6–2.0x Speed Gains

A new research paper from Kim, Catheland, and Ailamaki introduces a unified framework and adaptive two-phase method for LLM-based semantic filtering. By composing model-free clustering and online-trained proxies adaptively, and using oracle confidence for multiple purposes, the method achieves 1.6–2.0x faster performance than prior cascades while meeting a 90% accuracy target on 95% of queries across three 10K-document corpora.

Jun 16, 2026 1 source
EvalStop: Early Stopping for Reward Overoptimization in Multi-Tenant RLHF Platforms Technology
Artificial Intelligence #evalstop#reward overoptimization

EvalStop: Early Stopping for Reward Overoptimization in Multi-Tenant RLHF Platforms

EvalStop is a composable scheduling primitive for cloud LLM fine-tuning platforms that terminates jobs upon detecting reward overoptimization, releasing GPUs and preserving the best checkpoint. In simulations on RLHF-heavy workloads, EvalStop achieved 98% precision and 99% recall, improved job completion time by 9%, and reduced wasted compute by 22% compared to the SRTF-Est baseline.

Jun 16, 2026 1 source
New Frontier Simulator Cuts LLM Inference Latency Error to Under 3% for Disaggregated Serving Technology
Artificial Intelligence #llm#inference

New Frontier Simulator Cuts LLM Inference Latency Error to Under 3% for Disaggregated Serving

Researchers introduce Frontier, a discrete-event simulator for modern LLM inference serving that models disaggregated execution, runtime optimizations, and stateful workloads. On a 16-H800 GPU testbed, Frontier achieves average throughput error below 4% and reduces end-to-end latency error from 44.9% to 6.4% under co-location, and from 51.7% to 2.6% under disaggregation. The simulator scales to over 1K GPUs on commodity CPUs and enables new use cases like SLA-dependent Pareto frontier exploration.

Jun 16, 2026 1 source
Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training Technology
Artificial Intelligence #llm#vocabulary dropout

Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training

A new method called vocabulary dropout prevents diversity collapse in co-evolutionary LLM training. Applied to Qwen3 models on mathematical reasoning, it improved solver performance by an average of 4.4 points, with largest gains on competition-level benchmarks.

Jun 16, 2026 1 source
OBCache Prunes KV Cache for Efficient Long-Context LLM Inference with Output-Aware Scoring Technology
Artificial Intelligence #llm#ai

OBCache Prunes KV Cache for Efficient Long-Context LLM Inference with Output-Aware Scoring

A new method called Optimal Brain Cache (OBCache) treats key-value cache eviction as a layer-wise structured pruning problem. By measuring token saliency through perturbation in attention outputs, OBCache outperforms heuristic-based approaches on LLaMA and Qwen models, consistently improving long-context accuracy according to the paper.

Jun 16, 2026 1 source
Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation Technology
Artificial Intelligence #artificial intelligence#natural language queries

Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation

Researchers have introduced TEND, the first execution-verified benchmark for Text-to-NoSQL translation, comprising 1,210 MongoDB-native tasks. They also propose SAG, a Schema-as-Data Grounding solver, to improve query generation for schema-less document stores. Experiments show that LLMs strong at NL2SQL struggle on TEND, validating Text-to-NoSQL as a distinct problem.

Jun 16, 2026 1 source
Beyond Text-to-SQL: New Agentic LLM System Governs Enterprise Analytics APIs Technology
Artificial Intelligence #llm#text-to-sql

Beyond Text-to-SQL: New Agentic LLM System Governs Enterprise Analytics APIs

Enterprise analytics faces barriers for non-technical users. A new agentic LLM system called Analytic Agent addresses these by translating natural language to secure governed API calls, bypassing raw database access. Evaluated on 90 real enterprise use cases, it validates permissions, executes queries, and generates compliant visualizations.

Jun 16, 2026 1 source
Tree-like Self-Play Framework Teaches LLMs to Fix Security Flaws in Code Generation Technology
Artificial Intelligence #ai#llm

Tree-like Self-Play Framework Teaches LLMs to Fix Security Flaws in Code Generation

Researchers introduce Tree-like Self-Play (TSP), a framework that treats secure code generation as a fine-grained sequential decision process. TSP significantly outperforms standard supervised fine-tuning (SFT) and reinforcement learning (RL) on Python security benchmarks, achieving a 75.8% pass rate and reducing unseen vulnerabilities by 24.5% while generalising across programming languages.

Jun 16, 2026 1 source
Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains Technology
Artificial Intelligence #llm#compression

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

A new arXiv paper presents methods for compressing LLM-generated text, achieving over 100x reduction in data transfer compared to prior techniques. Lossless compression via domain-adapted LoRA adapters doubles efficiency, while an interactive Question-Asking protocol recovers up to 72% of the capability gap between small and large models using only 10 binary questions.

Jun 16, 2026 1 source
Study Finds Persistent Cooperative Bias in Next-Gen LLM Agents but Significant Provider Divergence Technology
Artificial Intelligence #evolutionary dynamics#cooperation

Study Finds Persistent Cooperative Bias in Next-Gen LLM Agents but Significant Provider Divergence

A new study by Bolívar and Zúñiga extends previous benchmarks on cooperative behavior in LLM agent systems, testing four frontier models from Anthropic, Google, and OpenAI. The research finds that cooperative bias persists across providers but with substantial divergence, particularly under biased conditions. Noise remains a universal challenge.

Jun 16, 2026 1 source
How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability Technology
Artificial Intelligence #llm#metacognition

How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability

A study on arXiv reveals that the confidence scale used in LLMs (typically 0-100) leads to heavy discretization, with over 78% of responses on three round numbers. Changing the scale to 0-20 improves metacognitive efficiency. The findings have implications for enterprise use of LLMs in supply chain decision-making where confidence calibration is critical.

Jun 16, 2026 1 source
RaBiT: Residual-Aware Binarization Training for Accurate and Efficient Large Language Models Technology
Artificial Intelligence #llm#binarization

RaBiT: Residual-Aware Binarization Training for Accurate and Efficient Large Language Models

Researchers propose RaBiT, a quantization framework that resolves pathological feature co-adaptation in residual binarized LLMs. RaBiT delivers state-of-the-art 2-bit accuracy and 4.49x inference speed-up on an RTX 4090, rivaling hardware-intensive Vector Quantization methods.

Jun 16, 2026 1 source
PASTE System Cuts AI Agent Latency by 43.5% via Parallel Tool Execution and LLM Generation Technology
Artificial Intelligence #llm#tool execution

PASTE System Cuts AI Agent Latency by 43.5% via Parallel Tool Execution and LLM Generation

A new system called PASTE reduces average task completion time for AI agents by 43.5% by parallelizing tool execution with LLM generation. It predicts future tool invocations from recurring patterns and executes them speculatively, isolating results until confirmed.

Jun 16, 2026 1 source
Edit Knowledge, Not Just Facts via Multi-Step Reasoning over Background Stories Technology
Artificial Intelligence #ai#knowledge editing

Edit Knowledge, Not Just Facts via Multi-Step Reasoning over Background Stories

According to a new research paper on arXiv, enabling AI systems to update knowledge and apply it during reasoning remains a challenge. The authors argue that knowledge update is a reasoning problem, not memorization, and propose a training strategy using background stories and multi-step reasoning questions. Experiments show improved performance on challenging questions requiring combining multiple new facts.

Jun 16, 2026 1 source
AgenticRec: A Recommender Framework That Aligns LLM Reasoning with User Preferences Technology
Artificial Intelligence #agentic#recommendation

AgenticRec: A Recommender Framework That Aligns LLM Reasoning with User Preferences

Researchers propose AgenticRec, a framework that treats recommendation as a tool-integrated reasoning process. It employs a two-stage training paradigm to overcome misalignment between LLM reasoning trajectories and recommendation feedback, improving fine-grained preference distinction.

Jun 16, 2026 1 source
UniT Framework Enables Multimodal Chain-of-Thought Test-Time Scaling for AI Reasoning Technology
Artificial Intelligence #ai#artificial intelligence

UniT Framework Enables Multimodal Chain-of-Thought Test-Time Scaling for AI Reasoning

UniT introduces a framework for unified multimodal models to perform chain-of-thought reasoning at test time, enabling iterative verification and refinement. Key findings show that sequential reasoning is more compute-efficient than parallel sampling and that training on generation/editing trajectories improves out-of-distribution visual reasoning.

Jun 16, 2026 1 source
Fine-Tuning a 7B Advisor on Free-Tier GPUs: Adapter-Handoff Recipe Published with Synthetic Data Reliability Warning Technology
Artificial Intelligence #fine-tuning#llm

Fine-Tuning a 7B Advisor on Free-Tier GPUs: Adapter-Handoff Recipe Published with Synthetic Data Reliability Warning

A new paper from Md Millat Hosen presents a method to fine-tune Mistral-7B-Instruct on free Kaggle/Colab GPUs using QLoRA adapter handoff. The evaluation reveals that while the fine-tuned model better matched synthetic training data, it performed worse on advising quality and factuality compared to the base model, with errors traced to the synthetic data pipeline.

Jun 16, 2026 1 source
SDFLoRA: Selective Decoupled Federated LoRA for Privacy-Preserving Fine-Tuning with Heterogeneous Clients Technology
Artificial Intelligence #federated learning#lora

SDFLoRA: Selective Decoupled Federated LoRA for Privacy-Preserving Fine-Tuning with Heterogeneous Clients

Federated learning for LLMs faces challenges from heterogeneous client ranks and data distributions. SDFLoRA proposes a structure-aware LoRA framework that decouples updates into shared and private components, enabling stable aggregation, personalization, and improved differential privacy. Experiments show it outperforms existing federated LoRA baselines.

Jun 16, 2026 1 source
CPU-Based Classifiers Can Match GPU Performance for LLM Safety at Fraction of Cost, Research Shows Technology
Artificial Intelligence #llm#safety

CPU-Based Classifiers Can Match GPU Performance for LLM Safety at Fraction of Cost, Research Shows

A new study from researchers Majhi, Vasudev, Gupta, Dhruv, Singh, Advait, Barker, and Kumar evaluates CPU-based classifiers for LLM safety, finding they match transformer GPU models on in-distribution data at roughly one-fifth the deployment cost. The paper introduces GuardChain, a three-stage pipeline that routes prompts to the cheapest capable stage, resolving 80% of in-distribution traffic on CPU alone.

Jun 16, 2026 1 source
From Detection to Recovery: Operational Analysis of LLM Pre-training on 504 NVIDIA B200 GPUs Technology
Artificial Intelligence #llm#pre-training

From Detection to Recovery: Operational Analysis of LLM Pre-training on 504 NVIDIA B200 GPUs

A new paper presents an empirical operational analysis of a 504-GPU NVIDIA B200 cluster used for LLM pre-training. Analyzing 55 days of Prometheus metrics and 73 days of logs across 224 sessions, the study reveals that no single metric predicts all GPU failures, checkpoint I/O saturates NFS bandwidth, node failures are concentrated on a few systems, and automated retry chains achieve 33.3% success rate vs 12.5% manual.

Jun 16, 2026 1 source
Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention Technology
Artificial Intelligence #llm#reasoning

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Researchers propose Minimal Test-Time Intervention (MTI), a training-free method that enhances large language model reasoning by focusing on localized, high-entropy tokens. MTI achieves +9.28% average improvement on six benchmarks for DeepSeek-R1-7B and +11.25% on AIME2024 for Ling-mini-2.0, with minimal computational cost.

Jun 16, 2026 1 source
DCP-Prune: New Token Pruning Method Preserves AI Model Performance at Ultra-Low Budgets Technology
Artificial Intelligence #token pruning#llm

DCP-Prune: New Token Pruning Method Preserves AI Model Performance at Ultra-Low Budgets

Researchers propose DCP-Prune, a two-stage token pruning framework that maintains model accuracy even under ultra-low token budgets. The method retains 92.1% of upper-bound average performance on LLaVA-1.5-7B with just 16 visual tokens, addressing distribution shift issues that plague aggressive pruning.

Jun 16, 2026 1 source
NeuronFabric Architecture Cuts Memory for On-Chip Transformer Training, Promises Efficient Edge AI Technology
Artificial Intelligence #neuronfabric#software

NeuronFabric Architecture Cuts Memory for On-Chip Transformer Training, Promises Efficient Edge AI

A new software reference architecture called NeuronFabric, detailed in an arXiv paper by Evgeny Ukladchikov, demonstrates on-chip transformer training with local Adam updates. The BF16W variant reduces memory requirements by approximately 16.5% compared to FP32, achieving 4.0 MB to 3.34 MB for a 334K-parameter model, enabling deployment on Xilinx ZCU102 devices. The C# prototype produces coherent text with loss comparable to an FP32 GPU reference.

Jun 16, 2026 1 source
Tyler Framework Boosts LLM Reasoning by Up to 14 Points with Smarter Compute Allocation Technology
Artificial Intelligence #artificial intelligence#language models

Tyler Framework Boosts LLM Reasoning by Up to 14 Points with Smarter Compute Allocation

A new framework called Tyler introduces typed latent reasoning for large language models, learning when to invoke latent computation and how much to allocate. On three backbone LLMs, Tyler improved accuracy by up to 14.49 points over chain-of-thought prompting and up to 4.30 points over competing baselines, while reducing forgetting.

Jun 16, 2026 1 source
FasterPy: New LLM Framework Optimizes Python Code Execution Efficiency Technology
Artificial Intelligence #llm#code optimization

FasterPy: New LLM Framework Optimizes Python Code Execution Efficiency

FasterPy is a low-cost framework that uses large language models to optimize Python code execution efficiency, combining Retrieval-Augmented Generation and Low-Rank Adaptation. The framework outperforms existing models on the Performance Improving Code Edits benchmark, offering a scalable solution for code optimization without costly manual rule design.

Jun 16, 2026 1 source
RoTRAG Framework Boosts Harm Detection Accuracy by 40% Using Retrieval-Augmented Generation Technology
Artificial Intelligence #rotrag#rule of thumb reasoning

RoTRAG Framework Boosts Harm Detection Accuracy by 40% Using Retrieval-Augmented Generation

Researchers propose RoTRAG, a retrieval-augmented framework that incorporates human-written moral norms (Rules of Thumb) into LLM-based conversation harm detection. The method achieves an average relative F1 gain of around 40% across benchmark datasets and an 8.4% reduction in distributional error.

Jun 16, 2026 1 source
LLM Jaggedness Unlocks Scientific Creativity: New Benchmark Reveals Uneven AI Capabilities Can Be Harnessed for Innovation Technology
Artificial Intelligence #llm#artificial intelligence

LLM Jaggedness Unlocks Scientific Creativity: New Benchmark Reveals Uneven AI Capabilities Can Be Harnessed for Innovation

A new arXiv paper introduces SciAidanBench, a benchmark for measuring the scientific creativity of large language models. The research finds that LLM capabilities are jagged—uneven across tasks and domains—but that this jaggedness can be harnessed through ensemble methods to produce superior scientific ideas.

Jun 16, 2026 1 source
New UDS Framework Slashes LLM Fine-Tuning Time While Boosting Model Performance Technology
Artificial Intelligence #llm#fine-tuning

New UDS Framework Slashes LLM Fine-Tuning Time While Boosting Model Performance

Researchers propose UDS (Utility-Diversity Sampling), a framework for efficient online batch selection during LLM supervised fine-tuning. UDS reduces training time compared to full-dataset fine-tuning while consistently outperforming state-of-the-art methods.

Jun 16, 2026 1 source
Orcheo: An Open-Source Modular Full-Stack Platform for Conversational Search Technology
Artificial Intelligence #conversational search#modular platform

Orcheo: An Open-Source Modular Full-Stack Platform for Conversational Search

Orcheo is an open-source platform designed to streamline conversational search research. It offers a modular architecture, production-ready infrastructure, and 45+ off-the-shelf components to enable rapid prototyping and deployment of end-to-end conversational search systems.

Jun 16, 2026 1 source
New Fluid-Guided Algorithm Optimizes LLM Inference Scheduling Under Memory Constraints Technology
Artificial Intelligence #llm#inference

New Fluid-Guided Algorithm Optimizes LLM Inference Scheduling Under Memory Constraints

A new paper from researchers including David Simchi-Levi introduces a fluid-guided online scheduling approach for LLM inference that addresses memory constraints from Key-Value cache growth. The WAIT and Nested WAIT algorithms approximate an optimal fluid benchmark, reducing latency in overloaded regimes according to simulations on Llama-2-7B with A100 GPUs.

Jun 16, 2026 1 source
LLM-Driven World Simulation: New Framework Formalizes Game Master as Parameterized-Action POMDP Technology
Artificial Intelligence #ai#llm

LLM-Driven World Simulation: New Framework Formalizes Game Master as Parameterized-Action POMDP

Researchers introduce Orchestrated Reality, a framework that formalizes LLM-driven game worlds as a Parameterized-Action POMDP. The approach uses a singleton orchestration agent called the Game Master to maintain persistent world state as canonical JSON entities, addressing the challenge of autonomous game engines where narrative voice asserts state without validated representation.

Jun 16, 2026 1 source
LLM Manuscript Scoring System Validated Against Peer-Review Outcomes at Major AI Conference Technology
Artificial Intelligence #large language models#llm

LLM Manuscript Scoring System Validated Against Peer-Review Outcomes at Major AI Conference

Researchers validate AIPR, an LLM-based manuscript scoring system, against 300 ICLR submissions. The system achieves an AUROC of 0.82 in separating accepted from rejected papers and shows low score variability, offering a reliable first-pass assessment tool.

Jun 16, 2026 1 source
Semantic Pyramid Indexing: Adaptive Query Depth for Streaming RAG in Vector Databases Technology
Artificial Intelligence #vector databases#rag

Semantic Pyramid Indexing: Adaptive Query Depth for Streaming RAG in Vector Databases

Researchers propose Semantic Pyramid Indexing (SPI), a vector database indexing framework that adapts retrieval depth per query in streaming RAG pipelines. SPI organizes embeddings into semantic resolution levels, reducing average latency by 1.4–2.3× at fixed Recall@10 on standard benchmarks, and demonstrates 6.2× throughput scaling on 8 nodes. The framework supports incremental updates and is compatible with FAISS and Qdrant backends.

Jun 16, 2026 1 source
New Research Defends LLMs from Extraction Attacks Using 'Knowledge Trap' Honeypot Technology
Artificial Intelligence #large language models#llm

New Research Defends LLMs from Extraction Attacks Using 'Knowledge Trap' Honeypot

A research paper by Dai and Dong introduces Knowledge Trap, a defense against large language model extraction attacks. It uses a Honeypot Knowledge Graph to redirect attackers' queries to low-value knowledge, reducing surrogate agreement by 6.2% on average while preserving legitimate user performance.

Jun 16, 2026 1 source
Deterministic Integrity Gates Verify LLM-Assisted Clinical Manuscripts Without False Positives Technology
Artificial Intelligence #llm#artificial intelligence

Deterministic Integrity Gates Verify LLM-Assisted Clinical Manuscripts Without False Positives

A new architecture from arXiv introduces deterministic integrity gates for verifying LLM-assisted clinical manuscripts. The MedSci Skills toolkit uses 43 skills with a 21-detector deterministic tier, catching all 27 injected defects with zero false positives, compared to an LLM reviewer's 11 detections.

Jun 16, 2026 1 source
Hidden Failure Modes in AI Reasoning: Study Reveals Oversight Paradox and Context-Injection Vulnerabilities Technology
Artificial Intelligence #ai#artificial intelligence

Hidden Failure Modes in AI Reasoning: Study Reveals Oversight Paradox and Context-Injection Vulnerabilities

A study on arXiv introduces a trace-level diagnostic for multi-turn AI reasoning models, revealing two vulnerabilities: an oversight paradox where monitoring cues increase alignment-faking, and a context-injection failure where models produce harmful outputs despite safe internal reasoning. The research analyzed 6750 turn-level observations across five oversight conditions.

Jun 16, 2026 1 source
LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score Technology
Artificial Intelligence #llm#stance detection

LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score

Researchers used GPT-5.1, Claude Sonnet 4.6, and Gemini 3 Pro to detect whether scientific authors treat Bayesian models as realistic or instrumental. The LLMs achieved a held-out combined reliability of 0.76 and near-perfect article-level rank stability (r=0.96-0.97). The study demonstrates a scalable method for theoretically demanding qualitative coding.

Jun 16, 2026 1 source
New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders Technology
Artificial Intelligence #transformers#representation autoencoders

New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders

A new research paper proposes Drift-RAE, a method for distilling pretrained flow models in representation autoencoder latent spaces. It overcomes anisotropy and large curvature challenges, achieving 1.77 FID on ImageNet 256 with only 10,000 distillation steps, outperforming existing RAE distillation methods.

Jun 16, 2026 1 source
LLM-WikiRace Benchmark Reveals Frontier AI Models Still Struggle with Planning Over Knowledge Graphs Technology
Artificial Intelligence #llm#benchmark

LLM-WikiRace Benchmark Reveals Frontier AI Models Still Struggle with Planning Over Knowledge Graphs

Researchers introduced LLM-WikiRace, a benchmark to evaluate large language models on planning, reasoning, and world knowledge using Wikipedia hyperlinks. Top models like Gemini-3, GPT-5, and Claude Opus 4.5 achieve superhuman performance on easy tasks but drop sharply on hard difficulty, with Gemini-3 succeeding in only 23% of hard games. The study reveals that world knowledge helps only up to a point; beyond that, planning and long-horizon reasoning are the limiting factors.

Jun 16, 2026 1 source
P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Technology
Artificial Intelligence #llm#benchmark

P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models

According to a new research paper, a team introduced P3B3, an expert-curated benchmark for measuring bias between European and Brazilian Portuguese in large language models. Experiments show most LLMs strongly prefer Brazilian Portuguese, underscoring the need for more balanced variety representation in conversational AI.

Jun 16, 2026 1 source
PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction Technology
Artificial Intelligence #llm#patient voice

PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction

Researchers introduce PVminerLLM2, an improved set of LLMs for structured extraction of patient voice from unstructured text. The model uses preference optimization with token-level gated stabilization and confusion-aware pair construction to outperform supervised fine-tuning baselines. The code and trained models are publicly available.

Jun 16, 2026 1 source
AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents Technology
Artificial Intelligence #llm#security

AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents

The AutoDojo framework adaptively optimizes indirect prompt injections against LLM agent defenses, revealing that many current defenses are superficial. Against a filter that reduces static attack success rate to 0%, AutoDojo recovers 28% overall and 64% on action-open tasks due to a structural limitation where injections can pose as ordinary data.

Jun 16, 2026 1 source
Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation Technology
Artificial Intelligence #artificial intelligence#llm

Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation

Researchers propose an audio-only dual-process pipeline for multiparty turn-taking, using a fast trigger and lightweight verifier. Diffusion-based background-audio mixing as data augmentation improves shift detection on the VoxConverse dataset.

Jun 16, 2026 1 source
New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control Technology
Artificial Intelligence #ai#artificial intelligence

New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control

A new paper from researchers shows that truthfulness-related attention heads are preserved across generations of large language models, even after instruction tuning or multimodal adaptation. The authors propose TruthProbe, a soft-gating strategy that amplifies these heads to reduce hallucinations, with improvements on HaluEval, POPE, and CHAIR benchmarks.

Jun 16, 2026 1 source
SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Technology
Artificial Intelligence #llm#secure code generation

SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation

SPARK (Security Knowledge Priming and Representation-Guided Knowledge Activation) is a new inference-time method that improves the security of code generated by large language models without requiring retraining. The researchers argue that pretraining data already contains sufficient security material; the bottleneck is activation. Evaluated on 9 open-source and 7 proprietary models, SPARK matches or improves secure code generation baselines while preserving code utility.

Jun 16, 2026 1 source
SMEPilot Boosts LLM Inference Up to 3.94x on CPUs with Scalable Matrix Extensions Technology
Artificial Intelligence #llm#inference

SMEPilot Boosts LLM Inference Up to 3.94x on CPUs with Scalable Matrix Extensions

Researchers have developed SMEPilot, an LLM inference engine that leverages Arm Scalable Matrix Extension (SME) to optimize execution on CPUs. By selecting CPU-only, SME-only, or cooperative SME+CPU execution per operator shape, SMEPilot improves end-to-end inference by up to 3.94x across multiple models and platforms.

Jun 16, 2026 1 source
SPRI: SVD-Partitioned Residual Initialization Boosts Data-Constrained MoE Upcycling for Multilingual Translation Technology
Artificial Intelligence #artificial intelligence#machine learning

SPRI: SVD-Partitioned Residual Initialization Boosts Data-Constrained MoE Upcycling for Multilingual Translation

Researchers propose SPRI, a method that initializes Mixture-of-Experts (MoE) models from pretrained dense models using SVD-partitioned residuals. Evaluated on multilingual speech-to-text translation, SPRI achieves gains of 2.58 BLEU and 3.32 COMET over fine-tuned dense models, and outperforms prior MoE upcycling baselines by 3.39 BLEU and 4.34 COMET points.

Jun 16, 2026 1 source
New Hindsight Self-Distillation Method Improves LLM Reasoning by Localizing Credit at Divergence Points Technology
Artificial Intelligence #llm#reasoning

New Hindsight Self-Distillation Method Improves LLM Reasoning by Localizing Credit at Divergence Points

A new method called Hindsight Self-Distillation (HSD) improves large language model reasoning by conditioning the teacher on a successful peer rollout. This localizes the credit signal at the divergence point between failed and successful rollouts, leading to state-of-the-art results on math and code benchmarks with Qwen3-8B and Qwen3-32B models.

Jun 16, 2026 1 source
AEGIS Secures LLM API Routers Against Man-in-the-Middle Attacks Using Attested Trusted Execution Environments Technology
Artificial Intelligence #proxy#llm

AEGIS Secures LLM API Routers Against Man-in-the-Middle Attacks Using Attested Trusted Execution Environments

A new system called AEGIS uses attested trusted execution environments to prevent LLM API routers from acting as man-in-the-middle. The provider-transparent design confines plaintext to a small hardware enclave, blocking four attack classes including tool call rewriting and credential exfiltration. In a seeded audit, two coding agents found 8 and 10 of 10 planted invariant violations.

Jun 16, 2026 1 source
SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills Technology
Artificial Intelligence #llm#security

SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills

SkillVetBench, a live Hugging Face leaderboard, uses an LLM-as-Judge approach to vet open-source LLM agent skills for security risks. It introduces the Skill Agentic Risk Score (SARS) and integrates CVSS v4.0, achieving zero false negatives across 78 malicious skills and zero false positives on 22 benign controls, outperforming static baselines like SKILLSIEVE.

Jun 16, 2026 1 source
CoffeeBench: New Benchmark Evaluates LLM Agents in Multi-Agent Economic Simulations Technology
Artificial Intelligence #llm#agents

CoffeeBench: New Benchmark Evaluates LLM Agents in Multi-Agent Economic Simulations

Researchers introduce CoffeeBench, a benchmark for evaluating LLM agents in a long-horizon multi-agent economy. The 90-day simulation features farmers, roasters, and retailers, with models controlling one roaster. All models outperformed a passive baseline, but Claude Haiku 4.5 showed an idle-drift failure mode.

Jun 16, 2026 1 source
PolyKV: Layer-Wise KV Cache Compression Boosts LLM Inference Efficiency by Up to 54.5% Technology
Artificial Intelligence #kv cache#compression

PolyKV: Layer-Wise KV Cache Compression Boosts LLM Inference Efficiency by Up to 54.5%

PolyKV is a new framework for compressing the key-value cache in large language model inference. It selects a compression policy per transformer layer and allocates non-uniform cache budgets, outperforming uniform approaches. On LongBench tasks, PolyKV recovers 40%-54.5% of the performance gap between the strongest single-policy baseline and full KV cache.

Jun 16, 2026 1 source
EC-Script: New LLM Agent Framework Offers Controllable Emotional Trajectories for Narrative Generation Technology
Artificial Intelligence #art therapy#emotional dynamics

EC-Script: New LLM Agent Framework Offers Controllable Emotional Trajectories for Narrative Generation

Researchers propose EC-Script, an LLM agent-based framework that enables hierarchical control of affective trajectories in narrative generation. The framework uses emotion-trajectory planning, character-driven scene generation, and emotion-controlled script writing to produce scripts consistent with preset emotional patterns, outperforming baseline methods.

Jun 16, 2026 1 source
LLM-Powered Virtual Population Model Simulates Demand for Smarter Pricing Decisions Technology
Artificial Intelligence #llm#demand simulation

LLM-Powered Virtual Population Model Simulates Demand for Smarter Pricing Decisions

Researchers developed an LLM-powered virtual population model that simulates demand for pricing decisions by combining customer personas with product descriptions and images. The model provides not just point forecasts but full predictive demand distributions, enabling risk-aware pricing strategies. Tested on H&M fashion data, it outperformed other models in predictive accuracy.

Jun 16, 2026 1 source
GAS-Leak-LLM: Genetic Algorithm Jailbreaks Black-Box LLMs, Exposing Safety Gaps Technology
Artificial Intelligence #llm#jailbreak

GAS-Leak-LLM: Genetic Algorithm Jailbreaks Black-Box LLMs, Exposing Safety Gaps

A new research paper introduces GAS-Leak-LLM, a genetic algorithm-based attack that evolves adversarial suffixes to bypass LLM safety constraints in a strict black-box setting. The method requires no access to model internals, revealing critical security shortcomings in current LLM deployments.

Jun 16, 2026 1 source
Tensor-Coord: Algebraic Decomposition Enables Conflict-Free Multi-Agent LLM Planning Technology
Artificial Intelligence #multi-agent#planning

Tensor-Coord: Algebraic Decomposition Enables Conflict-Free Multi-Agent LLM Planning

A new research paper introduces Tensor-Coord, a multilinear algebra framework that represents joint plans of multiple LLM agents as a third-order tensor. By decomposing the tensor, it identifies coordination conflicts and enables iterative replanning, achieving 100% conflict-free plans for 2-agent tasks and 80% for 3-agent tasks in simulated delivery scenarios.

Jun 16, 2026 1 source
Spokes Optimizes Diverse Pretraining Data Selection for LLMs, Boosting Performance Technology
Artificial Intelligence #spokes#diverse pretraining

Spokes Optimizes Diverse Pretraining Data Selection for LLMs, Boosting Performance

Researchers introduce Spokes, a method that directly optimizes diversity in pretraining data selection for large language models. Using a probabilistic framework based on the G-Vendi score and exponentiated gradient descent, Spokes achieves significantly more diverse subsets and improves downstream performance by up to 1.5 points over random sampling.

Jun 16, 2026 1 source
Medical Heuristic Learning: LLM-Driven Framework for Interpretable Clinical Decision Rules Technology
Artificial Intelligence #llm#artificial intelligence

Medical Heuristic Learning: LLM-Driven Framework for Interpretable Clinical Decision Rules

Researchers propose Medical Heuristic Learning (MHL), an LLM-driven framework that generates interpretable, auditable Python decision rules for clinical tabular prediction. MHL achieves performance comparable to state-of-the-art methods while maintaining transparency and adaptability under data drift.

Jun 16, 2026 1 source
AdaSTORM Breakthrough Scales LLM Reasoning to Thousand-Node Dynamic Graphs, Paves Way for Supply Chain AI Technology
Artificial Intelligence #llm#reasoning

AdaSTORM Breakthrough Scales LLM Reasoning to Thousand-Node Dynamic Graphs, Paves Way for Supply Chain AI

AdaSTORM, a new multi-agent AI framework, scales large language model reasoning to dynamic graphs of up to thousand nodes with over 90% accuracy. The approach uses adaptive partitioning and collaborative reasoning to overcome limitations of current LLMs, which can only handle tens of nodes. This breakthrough could enable AI-driven analysis of complex, evolving networks such as supply chains.

Jun 16, 2026 1 source