iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
UniBrain: A Unified Multimodal Model for Brain MRI Imputation and Understanding DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds N-Sea appoints Pim Nelemans as chief executive, succeeding Martin Adler ‘We’re not flipping a switch and pushing it to everyone at once’: Sonos is about to make its biggest changes yet to the controversial new app, designed to make it way more intuitive to use — and it seems to have learned from its past mistakes New Generalization Bounds for Deep Learning Models via Local Robustness and Stability Deep Residual Injection Method Enables Full-Spectrum Forensic AI Detection in Multimodal Models JoyAI-VL-Interaction Model Brings Real-Time Vision-Language AI to Enterprise Applications LectūraAgents Multi-Agent Framework Promises Adaptive Personalized AI-Assisted Learning Amazfit Cheetah 2 Ultra: The Most Expensive Smartwatch Yet—Is It Worth the Price? UniBrain: A Unified Multimodal Model for Brain MRI Imputation and Understanding DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds N-Sea appoints Pim Nelemans as chief executive, succeeding Martin Adler ‘We’re not flipping a switch and pushing it to everyone at once’: Sonos is about to make its biggest changes yet to the controversial new app, designed to make it way more intuitive to use — and it seems to have learned from its past mistakes New Generalization Bounds for Deep Learning Models via Local Robustness and Stability Deep Residual Injection Method Enables Full-Spectrum Forensic AI Detection in Multimodal Models JoyAI-VL-Interaction Model Brings Real-Time Vision-Language AI to Enterprise Applications LectūraAgents Multi-Agent Framework Promises Adaptive Personalized AI-Assisted Learning Amazfit Cheetah 2 Ultra: The Most Expensive Smartwatch Yet—Is It Worth the Price?
Home ›› Technology ›› Ai ›› Llms ›› SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills

SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills

SkillVetBench, a live Hugging Face leaderboard, uses an LLM-as-Judge approach to vet open-source LLM agent skills for security risks. It introduces the Skill Agentic Risk Score (SARS) and integrates CVSS v4.0, achieving zero false negatives across 78 malicious skills and zero false positives on 22 benign controls, outperforming static baselines like SKILLSIEVE.

iG
iGEN Editorial
June 16, 2026
SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills

The rapid growth of open-source LLM agent ecosystems has introduced new security challenges, particularly from community-contributed skills that extend agent capabilities. These modular tool definitions often go unvetted, leaving systems vulnerable to attacks at the instruction layer that traditional code scanners cannot detect. To address this gap, researchers have developed SkillVetBench, a live public leaderboard on Hugging Face that employs an LLM-as-Judge framework to evaluate agent skills across multiple security dimensions.

The Problem: Code-Layer Blindness

Existing security scanners operate at the code layer and are structurally blind to instruction-layer and multi-agent risks. These include natural-language directives that can hijack an agent, exfiltrate data through encoded side channels, or chain harm across processing pipelines. According to the SkillVetBench paper, conventional tools miss between 89% and 100% of instruction-layer threats such as Prompt Injection and Memory Poisoning. For example, the code analysis tool CODEBERT detected none of nine memory-poisoning skills.

SkillVetBench and the SARS Metric

SkillVetBench introduces the Skill Agentic Risk Score (SARS), a five-dimensional agentic-risk metric with a principled weighted formula designed for instruction-following systems. The platform integrates full CVSS v4.0 vector decomposition and features a ClawHub dual-view, which places the LLM-generated review alongside the official marketplace verdict. This allows users to compare automated assessments with human moderation directly.

Zero False Negatives, Zero False Positives

The LLM-as-Judge stage achieved zero false negatives across 78 confirmed-malicious skills and zero false positives across 22 benign controls in the companion benchmark study. In contrast, the best static baseline, SKILLSIEVE, still missed 15% of malicious skills. This demonstrates the effectiveness of semantic, LLM-based evaluation over traditional signature-based methods.

Instruction-Layer Threats: A Critical Blind Spot

Threat Category Conventional Tool Detection Rate SkillVetBench Performance
Prompt Injection 0–11% Zero false negatives overall
Memory Poisoning 0% (CODEBERT) Zero false negatives overall

Conventional code scanners fail to catch instruction-layer attacks because they lack semantic understanding. The SkillVetBench approach, by using an LLM as judge, can interpret natural-language commands and identify malicious intent that would otherwise slip through.

Variability Across LLM Evaluators

Detection rates varied from 35% to 95% across four LLM evaluators tested in the paper. This variability motivates the use of ensemble scoring in production deployments, where multiple judges vote on risk severity. The paper notes that no single LLM judge is sufficient for reliable security vetting.

The researchers—Hossain, Ismail, Puppala, Sai, Alam, Md Jahangir, Ahad, Tanzim, and Talukder, Sajedul—have made SkillVetBench publicly available on Hugging Face to help the open-source community vet agent skills before deployment. As LLM agents become more common in enterprise workflows, tools like SkillVetBench provide a critical layer of security that code-level scanners cannot offer. For technology procurement leaders and enterprise software buyers, this represents an important step toward safe adoption of open-source AI components.


Sources:

Keep Reading

Recommended Stories

New Attack Forces Costly Model Usage in Multimodal LLM Cascades Technology

New Attack Forces Costly Model Usage in Multimodal LLM Cascades

A research paper introduces the Forced Deferral Attack (FDA), which manipulates confidence thresholds in multimodal large language model cascades, causing queries to be routed to more expensive strong models. The attack raises security concerns for enterprises deploying cost-optimized AI systems.

June 16, 2026
AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems Technology

AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems

A new benchmark called AgentLeak evaluates privacy leakage in multi-agent large language model (LLM) systems, finding that inter-agent messages leak at 68.8% compared to 27.2% for final outputs. Across 1,000 scenarios and five models, total system exposure reaches 68.9%, highlighting risks invisible to standard output-only audits.

June 16, 2026
New Automated Jailbreak Attack UNIATTACK Achieves High Success Rate Against Multi-Layered LLM Defenses Technology

New Automated Jailbreak Attack UNIATTACK Achieves High Success Rate Against Multi-Layered LLM Defenses

Researchers present UNIATTACK, an adversarial testing framework that extracts high-impact attack features from existing exploits and uses a specialized attacker LLM to compose flexible templates. The framework achieves an average attack success rate improvement of 64.63% to 248.82% over baselines on models with multi-layered defenses, while costing only 0.03% to 4.96% of baseline costs.

June 16, 2026
Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents Technology

Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents

Researchers have released Open-SWE-Traces, a dataset of 207,489 software engineering agent trajectories spanning nine programming languages, sourced from 20,000 real-world pull requests. Fine-tuning on this data yields models that achieve state-of-the-art resolve rates on multiple SWE-bench benchmarks, advancing autonomous software engineering.

June 16, 2026