2606.11543v1 Jun 10, 2026 cs.AI

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

Jianghao Lin
Jianghao Lin
Shanghai Jiao Tong University
Citations: 1,555
h-index: 20
Bingwei Lu
Bingwei Lu
Citations: 3
h-index: 1
Bo-Sheng Huang
Bo-Sheng Huang
Citations: 2
h-index: 1
Yuanjian Zhou
Yuanjian Zhou
Citations: 68
h-index: 3
Weinan Zhang
Weinan Zhang
Citations: 992
h-index: 18
Zhiyu Chen
Zhiyu Chen
Citations: 20
h-index: 2
Zihan Guo
Zihan Guo
Citations: 42
h-index: 2

Agent Skills augment large language model (LLM) agents with procedural knowledge at inference time, but current benchmarks rarely distinguish what a Skill says from how it is organized. We study this distinction through Progressive Disclosure, where a concise root file points agents to supporting resources on demand, and compare it with a normalized flat baseline. We present SkillJuror, a framework for evaluating Skill writing paradigms through semantically controlled variants, matched multi-trial evaluations, and trajectory evidence while holding task knowledge fixed. In an 82-task SkillsBench study, Progressive Disclosure changes runtime behavior before aggregate outcomes: distinct Skill resources touched per trajectory rise from 1.18 to 3.85, and effective uptake events rise from 1.33 to 3.92. It also yields 17 additional verifier-passing trials out of 410 matched trials (+4.1%) over the normalized flat baseline. The benefit is task-dependent. Progressive Disclosure helps when supporting resources guide implementation, checking, or repair, but is weaker when success hinges on exact output conventions, numerical thresholds, or long artifact-generation pipelines. These results show that Skill organization is not mere presentation: it can change how agents search and apply procedural knowledge, while outcome gains depend on whether the exposed resources are actionable for the task. Code is available at https://github.com/zhiyuchen-ai/skill-juror.

0 Citations
0 Influential
30 Altmetric
150.0 Score
Original PDF
0

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!