2602.01915v1 Feb 02, 2026 cs.LG

VLM 기반 경험 재사용 (VLM-Guided Experience Replay)

VLM-Guided Experience Replay

Tom Jurgenson

Citations: 158

h-index: 5

Elad Sharony

Citations: 6

h-index: 1

Orr Krupnik

Citations: 66

h-index: 4

Dotan Di Castro

Citations: 1,358

h-index: 16

Shie Mannor

Citations: 727

h-index: 13

최근 대규모 언어 모델(LLM) 및 시각-언어 모델(VLM)의 발전은 강력한 의미론적 및 다중 모드 추론 능력을 가능하게 하여 강화 학습(RL)에서 샘플 효율성, 고차원 계획 및 해석 가능성을 향상시키는 새로운 기회를 제공합니다. 기존 연구에서는 LLM 및 VLM을 RL의 다양한 구성 요소에 통합했지만, 경험을 저장하고 재사용하는 핵심 구성 요소인 재사용 버퍼는 아직 탐구되지 않았습니다. 본 연구에서는 VLM을 활용하여 재사용 버퍼 내의 경험 우선순위를 결정하는 문제를 해결하고자 합니다. 핵심 아이디어는 사전 훈련된 VLM(미세 조정 불필요)을 자동 평가기로 사용하여 에이전트의 경험에서 유망한 부분 경로를 식별하고 우선순위를 부여하는 것입니다. 게임 플레이 및 로봇 공학 시나리오를 포함하여 이산 및 연속 도메인에 걸쳐, 제안된 우선순위 결정 방법을 사용하여 훈련된 에이전트는 평균 성공률이 11~52% 향상되고, 기존 방법과 비교하여 샘플 효율성이 19~45% 향상되었습니다. (https://esharony.me/projects/vlm-rb/)

Original Abstract

Recent advances in Large Language Models (LLMs) and Vision-Language Models (VLMs) have enabled powerful semantic and multimodal reasoning capabilities, creating new opportunities to enhance sample efficiency, high-level planning, and interpretability in reinforcement learning (RL). While prior work has integrated LLMs and VLMs into various components of RL, the replay buffer, a core component for storing and reusing experiences, remains unexplored. We propose addressing this gap by leveraging VLMs to guide the prioritization of experiences in the replay buffer. Our key idea is to use a frozen, pre-trained VLM (requiring no fine-tuning) as an automated evaluator to identify and prioritize promising sub-trajectories from the agent's experiences. Across scenarios, including game-playing and robotics, spanning both discrete and continuous domains, agents trained with our proposed prioritization method achieve 11-52% higher average success rates and improve sample efficiency by 19-45% compared to previous approaches. https://esharony.me/projects/vlm-rb/

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!