Home ›› Topics ›› natural language processing

Topic

natural language processing

31 stories

Artificial Intelligence #drug-disease#applicability condition

New AI Method Extracts Applicability Conditions for Therapeutic Drug-Disease Relations

Researchers introduce the task of applicability condition extraction for drug-disease relations, creating a dataset of 1,119 drug-disease pairs and a LoRA-enhanced method that outperforms existing approaches.

Jul 8, 2026 1 source

Research Challenges Assumption That Linguistic Relatedness Boosts Cross-Lingual AI Transfer

Technology

Artificial Intelligence #cross-lingual#transfer learning

Research Challenges Assumption That Linguistic Relatedness Boosts Cross-Lingual AI Transfer

A study of seven large language models (4B–671B parameters) fine-tuned on Arabic found no evidence of Semitic-specific transfer in zero-shot reading comprehension. Improvements across all languages, regardless of linguistic relatedness, suggest that task-format alignment—not cross-lingual knowledge transfer—drives the gains. The findings challenge assumptions underlying multilingual AI deployments in enterprise applications.

Jul 8, 2026 1 source

Toten Framework Outperforms Statistical Tokenization for Physical Quantities in Brazilian Portuguese Technical Texts

Technology

Artificial Intelligence #tokenization#ontology

Toten Framework Outperforms Statistical Tokenization for Physical Quantities in Brazilian Portuguese Technical Texts

Researchers present Toten, a framework that replaces statistical tokenization with ontology-based classification for physical quantities and technical notation in Brazilian Portuguese. The system leverages external oracles and achieves higher ontological atomicity and numerical reconstruction compared to state-of-the-art baselines.

Jul 8, 2026 1 source

LLM Paraphrase Augmentation Boosts Sign Language Translation Performance

Technology

Artificial Intelligence #sign language#translation

LLM Paraphrase Augmentation Boosts Sign Language Translation Performance

A new study proposes using a large language model (GPT-4o) to generate controlled paraphrase variants of training targets for sign language translation (SLT). Evaluated on three datasets, the method yields a modest BLEU-4 improvement on PHOENIX14T and reveals gains in semantic fidelity not captured by lexical metrics.

Jun 21, 2026 1 source

Hierarchical BART strategy achieves state-of-the-art Vietnamese multi-document summarization

Technology

Artificial Intelligence #bart#natural language processing

Hierarchical BART strategy achieves state-of-the-art Vietnamese multi-document summarization

A research team presents a novel hierarchical BART-based strategy for Vietnamese multi-document abstractive summarization, achieving a ROUGE2-F1 score of 0.2468 on the VLSP 2022 public test set. The approach condenses documents guided by a golden summary, producing fluent and concise outputs, and releases additional training data to the community.

Jun 21, 2026 1 source

MoCA-Agent: Market-of-Claims Code Agent Achieves Strong Results in Financial and Numerical Reasoning

Technology

Artificial Intelligence #ai#code agent

MoCA-Agent: Market-of-Claims Code Agent Achieves Strong Results in Financial and Numerical Reasoning

The arXiv paper introduces MoCA-Agent, a market-of-claims code agent that decomposes questions into atomic claims and uses trader agents to buy or sell those claims. It achieved strong performance on ten benchmarks including FinQA (78.3%), FinanceMath (76.0%), and FinChart-Bench (85.6%).

Jun 20, 2026 1 source

New Framework MACR Resolves Knowledge Conflicts in LLMs Using Multi-Agent Reasoning

Technology

Artificial Intelligence #llms#knowledge conflict

New Framework MACR Resolves Knowledge Conflicts in LLMs Using Multi-Agent Reasoning

A research paper proposes MACR, a novel framework for resolving knowledge conflicts in large language models (LLMs). Unlike existing approaches that privilege either internal parametric knowledge or external context, MACR uses an adaptive knowledge assessment and a multi-agent reasoning system to explicitly identify and resolve inconsistencies. Empirical results show MACR significantly outperforms state-of-the-art benchmarks while providing interpretable conflict resolutions.

Jun 20, 2026 1 source

Large Language Models Can Read Compressed Text That Humans Cannot, Researchers Find

Technology

Artificial Intelligence #large language models#artificial intelligence

Large Language Models Can Read Compressed Text That Humans Cannot, Researchers Find

A new research paper introduces BabelTele, a compact, non-human-readable text format that large language models can still interpret with high semantic fidelity. The approach compresses text to 27.9% of its original length while preserving 99.5% of meaning, potentially reducing context overhead and costs in enterprise AI deployments.

Jun 20, 2026 1 source

From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models

Technology

Artificial Intelligence #large language models#essay scoring

From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models

A study by Zuo et al. systematically analyzes hidden representations of eight LLMs across three essay datasets, finding that essay quality information is linearly decodable, emerges progressively across layers, and is robust to prompting strategies. The research identifies individual 'essay scoring neurons' and shows that their distribution shifts with essay length, offering insights into interpretability of LLM-based automated essay scoring systems.

Jun 20, 2026 1 source

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

Technology

Artificial Intelligence #natural language processing#persian nlp

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

Researchers present IHUBERT, a monolingual Persian language model pretrained on a 45GB curated subset of the Sepahr-Danesh collection using a multi-stage pipeline that includes vector-database-based semantic deduplication and domain-balanced pretraining. IHUBERT achieves top scores on extractive QA benchmarks PQuAD and ParsiNLU-RC, and best results on FarsTail NLI, while remaining competitive on NER and topic classification.

Jun 20, 2026 1 source

LLM-Based A/B Testing Needs Calibration: New Statistical Framework Reveals 39% Accuracy Gap

Technology

Artificial Intelligence #llm#a/b testing

LLM-Based A/B Testing Needs Calibration: New Statistical Framework Reveals 39% Accuracy Gap

A new paper from researchers at arXiv develops a statistical framework for using large language models (LLMs) as surrogates for human participants in A/B tests. The framework adapts surrogate endpoint theory, showing that raw LLM predictions recover only 39% of the human treatment effect, but calibration can close the gap. The study cautions that LLM-based A/B testing yields correct results only by assumption, whereas human testing is correct by design.

Jun 20, 2026 1 source

New Benchmark Reveals Remote Sensing AI Models Fail at Negation Comprehension

Technology

Artificial Intelligence #ai#multimodal

New Benchmark Reveals Remote Sensing AI Models Fail at Negation Comprehension

A new study introduces RS-Neg, the first benchmark to evaluate negation comprehension in remote sensing multimodal large language models. The evaluation reveals that advanced models exhibit hallucinations and performance degradation when handling negation. The proposed NeFo method, using about 5% unlabeled test samples, significantly improves negation understanding, with implications for critical applications like emergency response and logistics.

Jun 20, 2026 1 source

How Do Instructions Shape Speech? New Cross-Attribution Method Reveals Style Control in TTS

Technology

Artificial Intelligence #text-to-speech#style-captioned

How Do Instructions Shape Speech? New Cross-Attribution Method Reveals Style Control in TTS

A research paper introduces cross-attention attribution for style-captioned text-to-speech, adapting the DAAM framework to speech diffusion models. The method extracts per-token heatmaps across layers and steps, analyzing 3,600 combinations to reveal how caption tokens influence waveforms. Key findings include lower temporal variance for style tokens, correlation with F0 and energy, and peak style conditioning in early ODE steps and deep layers.

Jun 20, 2026 2 sources

CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models

Technology

Artificial Intelligence #large language models#combinatorial counting

CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models

CombEval is a dynamic benchmark for evaluating combinatorial counting in large language models. It uses typed Cofola specifications to generate problems with verified answers. Tests on 11 LLMs reveal persistent failures on ordered objects, indistinguishable elements, and nested dependencies.

Jun 20, 2026 1 source

Fast LLM-Based Semantic Filtering: Unified Framework and Adaptive Two-Phase Method Deliver 1.6–2.0x Speed Gains

Technology

Artificial Intelligence #llm#semantic filtering

Fast LLM-Based Semantic Filtering: Unified Framework and Adaptive Two-Phase Method Deliver 1.6–2.0x Speed Gains

A new research paper from Kim, Catheland, and Ailamaki introduces a unified framework and adaptive two-phase method for LLM-based semantic filtering. By composing model-free clustering and online-trained proxies adaptively, and using oracle confidence for multiple purposes, the method achieves 1.6–2.0x faster performance than prior cascades while meeting a 90% accuracy target on 95% of queries across three 10K-document corpora.

Jun 16, 2026 1 source

Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training

Technology

Artificial Intelligence #llm#vocabulary dropout

Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training

A new method called vocabulary dropout prevents diversity collapse in co-evolutionary LLM training. Applied to Qwen3 models on mathematical reasoning, it improved solver performance by an average of 4.4 points, with largest gains on competition-level benchmarks.

Jun 16, 2026 1 source

Study Reveals 27 Error Types in LLM Text-to-SQL, Introduces MapleDoctor Repair Framework

Technology

Artificial Intelligence #text-to-sql#in-context-learning

Study Reveals 27 Error Types in LLM Text-to-SQL, Introduces MapleDoctor Repair Framework

Researchers conducted the first comprehensive study of errors in LLM-based text-to-SQL systems using in-context learning. They identified 27 error types across 7 categories and proposed MapleDoctor, a detection and repair framework that outperforms existing solutions by repairing 13.8% more queries with negligible mis-repairs and reducing repair latency by 67.4%.

Jun 16, 2026 1 source

MA-ProofBench: New Benchmark Tests LLMs on Formal Theorem Proving in Mathematical Analysis

Technology

Artificial Intelligence #large language models#theorem proving

MA-ProofBench: New Benchmark Tests LLMs on Formal Theorem Proving in Mathematical Analysis

Researchers introduce MA-ProofBench, the first formal theorem-proving benchmark dedicated to mathematical analysis. It contains 200 theorems across six topics at two difficulty levels. Evaluations show that even the best model, GPT-5.5, achieves only 16% Pass@8 on undergraduate-level problems and 5% on Ph.D.-level problems, highlighting significant limitations of current LLMs in formal mathematical reasoning.

Jun 16, 2026 1 source

LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score

Technology

Artificial Intelligence #llm#stance detection

LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score

Researchers used GPT-5.1, Claude Sonnet 4.6, and Gemini 3 Pro to detect whether scientific authors treat Bayesian models as realistic or instrumental. The LLMs achieved a held-out combined reliability of 0.76 and near-perfect article-level rank stability (r=0.96-0.97). The study demonstrates a scalable method for theoretically demanding qualitative coding.

Jun 16, 2026 1 source

PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction

Technology

Artificial Intelligence #llm#patient voice

PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction

Researchers introduce PVminerLLM2, an improved set of LLMs for structured extraction of patient voice from unstructured text. The model uses preference optimization with token-level gated stabilization and confusion-aware pair construction to outperform supervised fine-tuning baselines. The code and trained models are publicly available.

Jun 16, 2026 1 source

UXBench: Measuring the Actionability of LLM-Generated UX Critiques

Technology

Artificial Intelligence #llms#ux

UXBench: Measuring the Actionability of LLM-Generated UX Critiques

UXBench evaluates LLM-generated UX critiques for actionability. It uses web fixtures over ten product-surface families and measures whether repair agents can improve interfaces. Results show models vary significantly in reliability.

Jun 16, 2026 1 source

New AI Framework ARVRE Generates Complex, Solvable Physics Word Problems Using Reinforcement Learning and Retrieval

Technology

Artificial Intelligence #agentic retrieval#reinforcement learning

New AI Framework ARVRE Generates Complex, Solvable Physics Word Problems Using Reinforcement Learning and Retrieval

Researchers introduce ARVRE (Agentic Retrieval Value Reinforced Equation-chain), a two-stage framework that generates complex and mathematically valid physics word problems by combining offline temporal-difference learning for equation chains, agentic retrieval-augmented generation for concept selection, and a large language model for natural language output. Human and automated evaluations show ARVRE outperforms existing approaches in complexity, novelty, and solvability.

Jun 16, 2026 1 source

Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

Technology

Artificial Intelligence #privacy-preserving#text sanitization

Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

Researchers propose DiSan, a privacy-preserving text sanitization framework that uses disentangled representations to separate task semantics from style identifiers. Experiments show it reduces personally identifiable information exposure by 20 times while maintaining 83% answer faithfulness on a multi-agent RAG benchmark, outperforming token-level masking.

Jun 16, 2026 1 source

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

Technology

Artificial Intelligence #artificial intelligence#language models

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation, but combining their knowledge is an underexplored problem. Researchers introduce TIE (Trajectory-based Iterative Ensembling), a framework that tracks confidence dynamics over answer-relevant positions to relay decoding trajectories between models, achieving strong performance on diverse reasoning tasks.

Jun 16, 2026 1 source

VibeThinker-3B: Small Language Model Matches Giants in Verifiable Reasoning, According to arXiv Paper

Technology

Artificial Intelligence #vibethinker-3b#small language model

VibeThinker-3B: Small Language Model Matches Giants in Verifiable Reasoning, According to arXiv Paper

A new technical report on arXiv introduces VibeThinker-3B, a compact 3B-parameter language model that achieves verifiable reasoning scores comparable to models orders of magnitude larger, including DeepSeek V3.2, GLM-5, and Gemini 3 Pro. The model uses a Spectrum-to-Signal post-training paradigm and achieves 94.3 on AIME26 and 80.2% Pass@1 on LiveCodeBench v6.

Jun 16, 2026 1 source

Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming

Technology

Artificial Intelligence #artificial intelligence#causal reasoning

Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming

Researchers introduce Vernier, a probing technique that reveals representational misalignment in instruction-tuned language models when variable names are replaced with placeholders, causing inconsistent answers to causal reasoning questions. The study tests models including Qwen-7B, Qwen-14B, and Llama-3.1-8B, and finds that success is bounded by model family, scale, and task.

Jun 16, 2026 1 source

Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse? A New Study Evaluates Four Models

Technology

Artificial Intelligence #llms#aphasia

Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse? A New Study Evaluates Four Models

A study examined whether instruction-tuned large language models (LLMs) can reliably perform token-level classification of Correct Information Units (CIUs) from aphasic discourse transcripts. Four models—Llama-3.1-8B, Qwen2.5-7B, Mistral-7B, and Phi-3-mini—were tested under zero-shot and few-shot prompting conditions. Results showed that few-shot prompting yielded competitive mean F1 scores between 0.776 and 0.817 for three models, but zero-shot was insufficient and Phi-3-mini was unstable. The authors recommend a human-in-the-loop approach for automated CIU scoring.

Jun 16, 2026 1 source

Koshur Diacritizer: A Byte-Level Model Restores Diacritics for Kashmiri Language NLP

Technology

Artificial Intelligence #kashmiri#diacritic restoration

Koshur Diacritizer: A Byte-Level Model Restores Diacritics for Kashmiri Language NLP

Researchers have developed Koshur Diacritizer, a byte-level sequence-to-sequence model based on ByT5-small, to restore missing diacritic marks in Kashmiri digital text. The model, trained on 23,700 sentence pairs, achieves a DERm of 0.2012 and word error rate of 0.2159, with a native expert accuracy of 77.5%. The dataset, model, and source code are publicly released to support low-resource language research.

Jun 16, 2026 1 source

AI-Driven Test Case Generation from Natural Language: Survey Reveals Six Quality Gaps and Research Roadmap

Technology

Artificial Intelligence #ai#test-case-generation

AI-Driven Test Case Generation from Natural Language: Survey Reveals Six Quality Gaps and Research Roadmap

A systematic review of 21 primary studies on AI-driven test case generation from natural language requirements reveals that no existing approach simultaneously satisfies six key quality dimensions: automation, ambiguity handling, domain applicability, traceability, evaluation thoroughness, and hallucination control. The survey synthesizes three evolutionary eras and proposes four actionable research guidelines targeting hallucination, traceability, complexity sensitivity, and compliance.

Jun 16, 2026 1 source

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

Technology

Artificial Intelligence #tied expert layers#mixture-of-experts

Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half

A new arXiv paper from Jaggi proposes Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers. Pretraining experiments show memory footprint reduction by almost 2x with virtually no degradation in perplexity or downstream quality, evaluated on OLMoE, Qwen3, and DeepSeek-style architectures.

Jun 16, 2026 1 source

How Multi-Label Classification and Generative AI Scale User Feedback Analysis

Technology

Artificial Intelligence #ai#machine learning

How Multi-Label Classification and Generative AI Scale User Feedback Analysis

A research paper on arXiv details how a major software company used supervised machine learning for multi-label topic classification and generative AI for summarization to efficiently process large volumes of user feedback. The study found that sentiment analysis alone does not reliably indicate user satisfaction, emphasizing the need for explicit satisfaction surveys.

Jun 16, 2026 1 source