2605.29277v1 May 28, 2026 cs.SE

Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA

Zhongkai Sun
Zhongkai Sun
Citations: 19
h-index: 3
Jianying Qu
Jianying Qu
Citations: 32
h-index: 3
Hanwen Du
Hanwen Du
Citations: 1
h-index: 1
Ye Yang
Ye Yang
Citations: 27
h-index: 3
Qiao Zhao
Qiao Zhao
Citations: 10
h-index: 2
Jun Zhang
Jun Zhang
Citations: 17
h-index: 2

We present Code-QA-Bench, a fully automated framework for synthesizing repository-level code understanding benchmarks that separates genuine code comprehension from documentation recall and pretraining memorization. The framework makes two methodological contributions: (1) an answer-first generation pipeline where a tool-equipped agent explores source code to produce verified gold answers before deriving questions, ensuring every task is grounded in real code structure; and (2) a three-condition experimental design evaluating agents under closed-book (no repository), code-only (documentation removed), and documented (full repository) conditions, with deltas directly quantifying documentation utility and memorization. We generate 528 code-derivable and 100 doc-dependent tasks across 10 Python repositories from SWE-Bench, scored by an LLM judge on accuracy, completeness, and specificity. Experiments on four frontier models reveal that code access is the dominant factor (+0.23 mean gain over closed-book), documentation provides modest additional benefit (+0.071 on doc-dependent tasks), and code-only $\approx$ documented on code-derivable tasks, validating the design. The framework is open-source and applicable to any well-documented Python repository.

0 Citations
0 Influential
1.5 Altmetric
7.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!