2603.01375v1 Mar 02, 2026 cs.AI

단어와 가중치: 공동 적응을 통한 다중 턴 상호 작용 간소화

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

Chenxing Wei

Citations: 34

h-index: 4

Ying He

Citations: 0

h-index: 0

F. Yu

Citations: 13

h-index: 2

Yao Shu

Citations: 19

h-index: 3

Bo Jiang

Citations: 7

h-index: 1

Zhongxiang Dai

Citations: 0

h-index: 0

Hong Wang

Citations: 20

h-index: 1

다중 턴 상호 작용에서 테스트 시간 정책 적응(T2PAM)은 추론 시간 동안 대규모 언어 모델(LLM)을 동적인 사용자 요구 사항에 맞추는 데 필수적입니다. 그러나 기존의 접근 방식은 테스트 시간 적응을 주로 단일 축의 문제로 취급하며, 프롬프트 엔지니어링을 통해 지시 사항을 개선하거나 테스트 시간 학습을 통해 가중치를 조정하는 방식이 일반적입니다. 이러한 접근 방식은 상호 작용 실패가 모호성과 능력 부족이라는 복합적인 요인에서 비롯된다는 점을 간과합니다. 우리는 이 두 가지 최적화 경로가 단순히 가산적인 관계가 아니라 시너지 효과를 가진다고 주장합니다. 즉, 의미적 명확성은 효과적인 파라미터 업데이트를 위한 전제 조건으로 작용합니다. 이에 따라, 우리는 ROSA2라는 프레임워크를 제안합니다. ROSA2는 상호 작용을 단어와 가중치라는 이질적인 공간에서의 공동 최적화 문제로 재구성합니다. ROSA2는 수학적으로 오류 신호를 분해하여 텍스트 기반의 기울기를 활용하여 의도 모호성을 해결하고 파라미터 업데이트를 통해 능력 격차를 해소합니다. 이론적으로, 우리는 이러한 공동 적응이 수렴에 필요한 파라미터 변화를 크게 줄인다는 것을 증명했습니다. 실험적으로, ROSA2는 MATH 데이터셋에서 최첨단 모델보다 30% 더 우수한 성능을 보였으며, 상호 작용 횟수를 40% 줄였습니다. 이는 컨텍스트를 개선하면 파라미터 업데이트의 잠재력을 최대한 활용할 수 있음을 보여줍니다.

Original Abstract

Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!