2605.29653v1 May 28, 2026 cs.AI

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

Renhong Huang

Citations: 65

h-index: 3

Dongdong Hua

Citations: 1

h-index: 1

Yang Yang

Citations: 8

h-index: 2

Yifei Sun

Citations: 8

h-index: 2

Feng Gao

Citations: 11

h-index: 2

Chunping Wang

Citations: 487

h-index: 12

Given a strategically complex board game, human players can quickly learn to devise strategies after playing a few rounds. Autonomous agents require similar capabilities in realistic interactive environments, yet existing agent benchmarks often fail to fully capture such strategic and evolving decision-making scenarios. We present PTCG-Bench, a benchmark built on the Pok'{e}mon Trading Card Game (PTCG) that evaluates LLM agents at two complementary levels: (1) their decision-making performance within a single complex environment, and (2) their ability to self-evolving through accumulated experience. We further include a modular harness ablation to better interpret agent performance without conflating it with model capability. Our experiments show that, although LLM agents can achieve non-trivial gameplay performance, sustained and stable self-evolution remains challenging, and performance is sensitive to harness design. We hope that PTCG-Bench will facilitate future research on harness-aware and self-evolving agents in realistic interactive environments.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!