Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems

New research from arXiv introduces Skill Composition Risk (SCR) and the SCR-Bench benchmark, revealing that LLM agent skills evaluated as safe in isolation can become harmful when composed in multi-step tasks. Attack success rates jump from near zero to over 96% in certain compositions, challenging current security vetting practices.

iGEN Editorial

June 17, 2026

Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems

As enterprises deploy LLM-powered agents to automate workflows, the security of agent skill ecosystems has emerged as a critical concern. Skills—the capability layer through which agents turn plans into actions—introduce risks such as data leakage, unauthorized operations, and tool misuse. According to a new paper on arXiv, traditional security vetting evaluates each skill in isolation, but real-world agent tasks often invoke multiple skills in a shared execution context. This creates a previously underexplored vulnerability called Skill Composition Risk (SCR): a skill that appears benign alone can become harmful when its outputs, trust signals, authorization cues, or side effects influence later invocations along an activated path.

The SCR-Bench Framework

To systematically evaluate SCR, the researchers developed SCR-Bench, a benchmark operating in controlled, sandboxed skill environments. Rather than relying solely on textual intent or surface behavior, SCR-Bench records downstream state changes and path-level outcomes across composed skill executions. The benchmark comprises three sub-benchmarks designed to capture different composition mechanisms:

SCR-CapFlow: Tests capability-flow composition, where a skill's output capabilities are passed to subsequent skills.
SCR-TrustLift: Examines trust-transfer composition, where trust signals from one skill elevate the trust of later skills.
SCR-AuthBlur: Assesses authorization-confusion composition, where authorization cues become blurred across skill boundaries.

Key Findings: Attack Success Rates Under Composition

The paper reports stark contrasts between isolated and composed evaluations. The table below summarizes the attack success rates (ASR) for each sub-benchmark:

Sub-benchmark	Isolated Baseline ASR	Composed Path ASR	Increase Factor
SCR-CapFlow	~0%	33.6%	Near-infinite
SCR-TrustLift (4 of 5 backends)	~0%	>96.5%	>96.5x
SCR-AuthBlur (L1 context)	L0 baseline (isolated)	+71.8% risky-approval rate	71.8% increase

According to the paper, composed paths expose risks largely absent under isolated evaluation. In SCR-CapFlow, attack success rate reaches 33.6% under composition, compared with near-zero isolated baselines. For SCR-TrustLift, the attack success rate exceeds 96.5% on four of five backends. In SCR-AuthBlur, the risky-approval rate increases by 71.8% relative to the L0 isolated baseline under the L1 context setting.

Implications for Enterprise Security

For CTOs and technology leaders integrating agent ecosystems, the findings underscore that agent skill security must be assessed at the level of activated paths rather than isolated artifacts. A skill that passes all individual checks could, when combined with others, enable unauthorized operations, data exfiltration, or privilege escalation. The paper positions SCR and SCR-Bench as a foundation for path-aware risk evaluation and defense in LLM agent skill ecosystems. Enterprises relying on agent workflows—such as automated supply-chain decisions or trade documentation processing—should incorporate path-level security testing before deployment.

The preprint, authored by researchers Xie, Du, Jiawei, Cheng, Yu, Zhou, Jiuan, Yin, and Zhaoxia, is available on arXiv and includes a public benchmark repository for further study.

Sources:

Benign in Isolation, Harmful in Composition: Security Risks in Agent Skill Ecosystems

The SCR-Bench Framework

Key Findings: Attack Success Rates Under Composition

Implications for Enterprise Security

Recommended Stories

How AI is outpacing cybersecurity and what firms must do next

AI's Role in Accelerating Cyber Vulnerabilities

AI Amplifies Voice Cybersecurity Risks in Enterprises

AI's Dark Side Exposes Shipping's Cyber Readiness Gap as Training Lags Behind Digitalisation