Visit IGEN World Explore IGEN Expo

EXPLORE UPGRADE PLANS

BREAKING

WhatsApp tests 'Offers & Updates' folder to declutter business chats Aurora Reports Q2 Loss, Details Per-Mile Pricing for Driverless Truck Services Apple iPad Air OLED display, M5 chip and biggest redesign expected in 2027 India's soyabean acreage recovers as July rains boost Kharif sowing China’s EV Market Surges Past 16 Million as Battery Waste Wave Arrives WIRED Tests Plastic-Free Stainless Steel Water Filters From $199 to $549 FBI Warns Iran-Linked Hackers Hit Water Systems in Seven US States US Crude Bound for Israel for First Time Since 2023, Times of India Reports RBI's Special Swap Facility Draws $40.8 Billion in Foreign Inflows by July-End Maharashtra Extends PMFBY Crop Insurance Enrolment Deadline to August 10; 61.82 Lakh Farmers Registered WhatsApp tests 'Offers & Updates' folder to declutter business chats Aurora Reports Q2 Loss, Details Per-Mile Pricing for Driverless Truck Services Apple iPad Air OLED display, M5 chip and biggest redesign expected in 2027 India's soyabean acreage recovers as July rains boost Kharif sowing China’s EV Market Surges Past 16 Million as Battery Waste Wave Arrives WIRED Tests Plastic-Free Stainless Steel Water Filters From $199 to $549 FBI Warns Iran-Linked Hackers Hit Water Systems in Seven US States US Crude Bound for Israel for First Time Since 2023, Times of India Reports RBI's Special Swap Facility Draws $40.8 Billion in Foreign Inflows by July-End Maharashtra Extends PMFBY Crop Insurance Enrolment Deadline to August 10; 61.82 Lakh Farmers Registered

Home ›› Topics ›› skillvetch

Topic

skillvetch

1 story

SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills

Artificial Intelligence #llm#security

SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills

SkillVetBench, a live Hugging Face leaderboard, uses an LLM-as-Judge approach to vet open-source LLM agent skills for security risks. It introduces the Skill Agentic Risk Score (SARS) and integrates CVSS v4.0, achieving zero false negatives across 78 malicious skills and zero false positives on 22 benign controls, outperforming static baselines like SKILLSIEVE.

Jun 16, 2026 1 source