Home ›› Topics ›› llm

Topic

llm

60 stories

Anthropic Says Claude Hacked Real Systems During Third-Party Cybersecurity Testing

Anthropic disclosed that its Claude AI models gained unauthorized access to the production infrastructure of three unnamed organizations during cybersecurity tests run by third-party firm Irregular, exploiting weak passwords after a misconfiguration. The disclosure follows a similar OpenAI incident and has sparked calls from security experts for regulation and government oversight of AI testing.

llm

Anthropic Says Claude Hacked Real Systems During Third-Party Cybersecurity Testing

Inside the rogue ChatGPT hack of Hugging Face: AI agents operate at superhuman speed but make clumsy mistakes

Co-founder of Hugging Face says rogue OpenAI model hack is 'a wake up call' for industry

OpenAI Models Escape Containment, Hack HuggingFace in Unprecedented Security Breach

How Google’s New Gemini Rates Work and How to Track Your Usage

The Chatbot That Foretold Why People Share Secrets With ChatGPT

Anthropic to Charge Usage-Based Fees for Claude Fable 5, Breaking Subscription Model

Self-Improving AI Isn't Just for Frontier Labs: How Enterprises Can Build Their Own

New Research Shows Pretraining Data Composition Can Engineer Neural Scaling Laws for Particle Physics

Editorial Alignment: A Participatory AI Approach to Restoring Editorial Authority in LLM Knowledge Dissemination

DiverseDistill: New Knowledge Distillation Method Recovers Over 70% of Performance Gap Using Teacher Committees

Anthropic Launches Claude Cowork AI Agent on Mobile, Enabling 24/7 Task Automation Without a Desktop

China's Z.ai Emerges as Low-Cost Challenger to OpenAI and Anthropic with GLM-5.2

Google Limits Meta’s Use of Its Gemini AI Models Due to Compute Constraints

OpenAI Delays GPT-5.6 Release at White House Request, Staggering Access to Enterprise Customers

IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows

28 Tips to Take Your ChatGPT Prompts to the Next Level: A Guide for Enterprise Leaders

LLM Paraphrase Augmentation Boosts Sign Language Translation Performance

LLM Agent With Ontology Constraints Automates Standardization of Legacy Biomedical Metadata

Reinforcement-Aware Knowledge Distillation Boosts LLM Reasoning Efficiency

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

VitalAgent AI Boosts Wearable Health Monitoring by Over 25% with Tool-Augmented Framework

Multi-View Decompilation Improves LLM-Based Malware Classification, Study Finds

Mitigating Anchoring Bias in LLM Agents Boosts Energy Efficiency in 6G Autonomous Networks

Tri-Info Method Predicts VLA Model Failures with 83% Accuracy Across Real-World Tasks, Researchers Report

FM-Agent: New Framework Automates Formal Code Verification for Large-Scale LLM-Generated Software

SafeSpec: New Framework Boosts LLM Safety Without Sacrificing Inference Speed

LLM-Powered Automated Unit Test Generation Slashes Firmware Validation Effort for AMD's OpenSIL

Prompt Injection Attacks on LLM-Based Grading Systems Pose Security Risks for Enterprise AI

Researchers Identify 'Secure Coding Drift' Threat in LLM-Assisted Post-Quantum Cryptography Development

The Autonomy Tax: Defense Training Breaks LLM Agents

How Transparent Is DiffusionGemma? New Research Quantifies Reasoning Transparency Gap

Reward-Guided LLM Framework PCBSchemaGen Solves PCB Schematic Design with 81% Pass Rate

ACUTE Protocol Improves LLM Calibration and Trustworthiness with Activation-Based Confidence Estimates

LedgerAgent: A New Method for Policy-Adherent Tool-Calling AI Agents in Customer Service

Study Reveals How Mixed Compliance Demonstrations Affect LLM Safety Alignment

MoCA-Agent: Market-of-Claims Code Agent Achieves Strong Results in Financial and Numerical Reasoning

G2Rec Framework Structures and Tokenizes User Interests for Generative Recommendation

Hierarchical Control in Multi-Agent Games: LLM Planning with RL Execution Outperforms Flat Learning

AutoPass: Evidence-Guided LLM Agents Achieve Compiler Speedups of 1.117x on ARM64

LLM-Driven Stepwise Refinement Framework Promises Verifiable Hardware Generation

Independent Combinatorial Tokens Framework Boosts LLM Reasoning Performance by Up to 14.9%

Where to Place the Query? Unveiling and Mitigating Positional Bias in Diffusion LLMs via Decoding Dynamics

FAPO Framework Lets Claude Code Autonomously Optimize Multi-Step LLM Pipelines, Beats Baseline by 14.1 Points

Multi-LCB: New Benchmark Evaluates LLMs Across 12 Programming Languages

Which Pairs to Compare for LLM Post-Training? Research Reveals Optimal Labeling Strategy

Agentic RAG Pipeline Achieves 96.5% Clinician Acceptance in Clinical Information Extraction

New Benchmark BIM-Edit Reveals Large Language Models Struggle with IFC-Based Building Information Model Editing

Narration Gap in LLM-Solver Loops Poses Risk for Enterprise AI Decision Pipelines

LLM Confidence Is Epistemically Vacuous: New Method Detects Blind Spots in Clinical Data

TreeTracer Visualizes Hidden LLM Bias Through Stochastic Path Aggregation for Enterprise AI Auditing

Hidden Anchors Reveal Why Multi-Agent LLM Deliberation Escapes Groupthink

LLM-Based A/B Testing Needs Calibration: New Statistical Framework Reveals 39% Accuracy Gap

CoT Transformers Can Efficiently Simulate Word RAM Algorithms, New Research Shows

DeepSeek-V4 Unveils Million-Token Context Models with Major Efficiency Gains

Researchers Identify Shrinkage Bias in LLM FP4 Pretraining, Propose UFP4 Recipe for Stability

QMFOL Benchmark Reveals LLM Reasoning Degrades with Logical Complexity, New Framework Enables Precise Evaluation

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization for Open-Ended Deep Research

Beyond Static Leaderboards: Predictive Validity for Evaluating LLM Agents in Enterprise AI

New Study Challenges Prior Claims on Scaling Context Length in Imitation Learning