2605.29653v1 May 28, 2026 cs.AI

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

Renhong Huang
Renhong Huang
Citations: 65
h-index: 3
Dongdong Hua
Dongdong Hua
Citations: 1
h-index: 1
Yang Yang
Yang Yang
Citations: 8
h-index: 2
Yifei Sun
Yifei Sun
Citations: 8
h-index: 2
Feng Gao
Feng Gao
Citations: 11
h-index: 2
Chunping Wang
Chunping Wang
Citations: 487
h-index: 12

Given a strategically complex board game, human players can quickly learn to devise strategies after playing a few rounds. Autonomous agents require similar capabilities in realistic interactive environments, yet existing agent benchmarks often fail to fully capture such strategic and evolving decision-making scenarios. We present PTCG-Bench, a benchmark built on the Pok'{e}mon Trading Card Game (PTCG) that evaluates LLM agents at two complementary levels: (1) their decision-making performance within a single complex environment, and (2) their ability to self-evolving through accumulated experience. We further include a modular harness ablation to better interpret agent performance without conflating it with model capability. Our experiments show that, although LLM agents can achieve non-trivial gameplay performance, sustained and stable self-evolution remains challenging, and performance is sensitive to harness design. We hope that PTCG-Bench will facilitate future research on harness-aware and self-evolving agents in realistic interactive environments.

0 Citations
0 Influential
6 Altmetric
30.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!