2605.14261v1 May 14, 2026 cs.AI

휴리스틱 병리 현상 및 불확실성 전파를 통한 추가적인 분산 감소: AIVAT 기술 패밀리

Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques

Citations: 1

h-index: 1

Citations: 267

h-index: 3

제한된 샘플 크기 또는 실험 실행 비용이 높을 때, 다중 에이전트 환경에서 에이전트의 성능을 어떻게 평가해야 할까요? AIVAT 기술 패밀리는 이러한 문제를 해결하기 위해 에이전트의 기대 보상에 대한 편향되지 않고 분산이 낮은 추정치를 도입했습니다. AIVAT의 중요한 구성 요소는 잠재적으로 낮은 값과 높은 값을 갖는 반사실적(counterfactual) 기록을 구별하는 휴리스틱 가치 함수입니다. 기존 연구의 중요한 한계점은 휴리스틱 가치 함수가 어떻게 선택되어야 하는지, 그리고 그 출력의 불확실성이 어떻게 처리되어야 하는지에 대한 제약 또는 지침이 거의 없다는 것입니다. 첫 번째 연구 기여에서, 우리는 AIVAT의 잠재적인 취약점을 강조하기 위해 휴리스틱 가치 함수를 파라미터화했습니다. 즉, (a) 샘플 분산을 직접적으로 경사 하강법을 적용하여 병리적으로 낮은 값으로 설정할 수 있으며, (b) 경사 하강/상승법을 사용하여 테스트 통계량에 대해 원하는 통계적 결론을 도출할 수 있습니다. 주요 결론은 휴리스틱 가치 함수는 평가 데이터를 관찰하기 전에 고정되어야 한다는 것입니다. 두 번째 연구 기여에서, 우리는 휴리스틱 불확실성을 전파하여 AIVAT 추정치의 불확실성을 정량화하는 방법을 보여줍니다. 이를 통해 역분산 가중 평균을 사용하여 분산을 추가적으로 줄일 수 있지만, AIVAT의 편향되지 않는다는 보장이 손실될 수 있습니다. 실험에서는 10,000개의 포커 핸드를 사용하여 우리의 휴리스틱 병리 현상 및 불확실성 결과를 보여주었으며, 후자는 통계적 결론을 내리는 데 필요한 샘플(포커 핸드) 수를 43.0% 줄이는 결과를 얻었습니다.

Original Abstract

How should an agent's performance in a multiagent environment be evaluated when there is a limited sample size or a high cost of running a trial? The AIVAT family of variance reduction techniques was proposed to address this challenge by introducing unbiased low-variance estimators of agents' expected payoffs. An important component of AIVAT is a heuristic value function that discriminates between potentially low- and high-value counterfactual histories. A notable gap in the literature is that there is little to no constraint or guideline on how the heuristic value function should be chosen or how uncertainty in its output should be handled. In our first contribution, we parameterize the heuristic value function to highlight AIVAT's potential vulnerabilities: a) the sample variance can be set pathologically low by directly applying gradient descent on the sample variance, and b) one can p-hack to draw a desired statistical conclusion via gradient descent/ascent on the test statistic. The main takeaway is that the heuristic value function should be fixed prior to observing the evaluation data! In our second contribution, we show how the heuristic uncertainty can be propagated to quantify the uncertainty of AIVAT estimates. It is then possible to further reduce the variance using inverse-variance weighted averaging, but AIVAT's unbiasedness guarantee may have to be sacrificed. In our experiments, we use a dataset of 10,000 poker hands to demonstrate our heuristic pathology and uncertainty results, with the latter yielding a 43.0% reduction in the number of samples (poker hands) needed to draw statistical conclusions.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!