iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Bhogapuram Airport User Fee Set: Rs 355 to Rs 1,255 per Passenger as AERA Issues Ad Hoc Tariff Order CrossCountry Ranked Worst UK Train Operator as Performance Scores Plummet Niqo Robotics shows India-built physical AI farming platform at France innovation conclave Cassandra Gaines unveils CAVRA Standard, a trucking industry blueprint for defensible carrier selection High-density exotic cherry varieties transform Kashmir orchards, fetch premium prices Waymo Recalls 3,871 Robotaxis Over Risk of Driving Into Freeway Construction Zones 'Fastest we've ever put a trade deal into force': British High Commissioner to India Lindy Cameron hails India-UK trade deal Samsung The Frame Pro 2026: The Best Art Television You Can Buy Flexport: New Tariff Wave Could Replace Expiring Trade Duties by Late July Crisil Forecasts Brent Crude at $90-95/barrel, Warns of India CAD Risk Bhogapuram Airport User Fee Set: Rs 355 to Rs 1,255 per Passenger as AERA Issues Ad Hoc Tariff Order CrossCountry Ranked Worst UK Train Operator as Performance Scores Plummet Niqo Robotics shows India-built physical AI farming platform at France innovation conclave Cassandra Gaines unveils CAVRA Standard, a trucking industry blueprint for defensible carrier selection High-density exotic cherry varieties transform Kashmir orchards, fetch premium prices Waymo Recalls 3,871 Robotaxis Over Risk of Driving Into Freeway Construction Zones 'Fastest we've ever put a trade deal into force': British High Commissioner to India Lindy Cameron hails India-UK trade deal Samsung The Frame Pro 2026: The Best Art Television You Can Buy Flexport: New Tariff Wave Could Replace Expiring Trade Duties by Late July Crisil Forecasts Brent Crude at $90-95/barrel, Warns of India CAD Risk
Home ›› Technology ›› Ai ›› Ai Ethics ›› Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems

Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems

New research from arXiv introduces Skill Composition Risk (SCR) and the SCR-Bench benchmark, revealing that LLM agent skills evaluated as safe in isolation can become harmful when composed in multi-step tasks. Attack success rates jump from near zero to over 96% in certain compositions, challenging current security vetting practices.

iG
iGEN Editorial
June 17, 2026
Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems

As enterprises deploy LLM-powered agents to automate workflows, the security of agent skill ecosystems has emerged as a critical concern. Skills—the capability layer through which agents turn plans into actions—introduce risks such as data leakage, unauthorized operations, and tool misuse. According to a new paper on arXiv, traditional security vetting evaluates each skill in isolation, but real-world agent tasks often invoke multiple skills in a shared execution context. This creates a previously underexplored vulnerability called Skill Composition Risk (SCR): a skill that appears benign alone can become harmful when its outputs, trust signals, authorization cues, or side effects influence later invocations along an activated path.

The SCR-Bench Framework

To systematically evaluate SCR, the researchers developed SCR-Bench, a benchmark operating in controlled, sandboxed skill environments. Rather than relying solely on textual intent or surface behavior, SCR-Bench records downstream state changes and path-level outcomes across composed skill executions. The benchmark comprises three sub-benchmarks designed to capture different composition mechanisms:

  • SCR-CapFlow: Tests capability-flow composition, where a skill's output capabilities are passed to subsequent skills.
  • SCR-TrustLift: Examines trust-transfer composition, where trust signals from one skill elevate the trust of later skills.
  • SCR-AuthBlur: Assesses authorization-confusion composition, where authorization cues become blurred across skill boundaries.

Key Findings: Attack Success Rates Under Composition

The paper reports stark contrasts between isolated and composed evaluations. The table below summarizes the attack success rates (ASR) for each sub-benchmark:

Sub-benchmark Isolated Baseline ASR Composed Path ASR Increase Factor
SCR-CapFlow ~0% 33.6% Near-infinite
SCR-TrustLift (4 of 5 backends) ~0% >96.5% >96.5x
SCR-AuthBlur (L1 context) L0 baseline (isolated) +71.8% risky-approval rate 71.8% increase

According to the paper, composed paths expose risks largely absent under isolated evaluation. In SCR-CapFlow, attack success rate reaches 33.6% under composition, compared with near-zero isolated baselines. For SCR-TrustLift, the attack success rate exceeds 96.5% on four of five backends. In SCR-AuthBlur, the risky-approval rate increases by 71.8% relative to the L0 isolated baseline under the L1 context setting.

Implications for Enterprise Security

For CTOs and technology leaders integrating agent ecosystems, the findings underscore that agent skill security must be assessed at the level of activated paths rather than isolated artifacts. A skill that passes all individual checks could, when combined with others, enable unauthorized operations, data exfiltration, or privilege escalation. The paper positions SCR and SCR-Bench as a foundation for path-aware risk evaluation and defense in LLM agent skill ecosystems. Enterprises relying on agent workflows—such as automated supply-chain decisions or trade documentation processing—should incorporate path-level security testing before deployment.

The preprint, authored by researchers Xie, Du, Jiawei, Cheng, Yu, Zhou, Jiuan, Yin, and Zhaoxia, is available on arXiv and includes a public benchmark repository for further study.


Sources:

Keep Reading

Recommended Stories