Artificial Intelligence #llm#security
SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills
SkillVetBench, a live Hugging Face leaderboard, uses an LLM-as-Judge approach to vet open-source LLM agent skills for security risks. It introduces the Skill Agentic Risk Score (SARS) and integrates CVSS v4.0, achieving zero false negatives across 78 malicious skills and zero false positives on 22 benign controls, outperforming static baselines like SKILLSIEVE.
Jun 16, 2026 1 source