2601.07376v1 Jan 12, 2026 cs.AI

OpenTinker: 에이전트 강화학습에서의 관심사 분리

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

Citations: 133

h-index: 2

Citations: 33

h-index: 2

우리는 알고리즘 설계, 실행, 그리고 에이전트-환경 상호작용 전반에 걸친 관심사 분리를 기반으로 구축된 거대언어모델(LLM) 에이전트 강화학습(RL) 인프라스트럭처인 OpenTinker를 소개합니다. OpenTinker는 단일형(monolithic) 엔드투엔드 RL 파이프라인에 의존하는 대신, 에이전트 학습 시스템을 명확히 정의된 추상화 경계를 가진 경량의 조합 가능한 구성 요소들로 분해합니다. 사용자가 에이전트, 환경, 상호작용 프로토콜을 지정하면, 추론과 학습은 관리형 실행 런타임에 위임됩니다. OpenTinker는 공유 자원상에서 LoRA 기반 및 전체 파라미터 RL, 지도 미세 조정(SFT), 추론을 포함한 학습 및 추론 워크로드를 관리하는 중앙 집중식 스케줄러를 도입합니다. 또한, 우리는 OpenTinker를 다중 에이전트 학습으로 확장하기 위한 설계 원칙을 논의합니다. 마지막으로, 실제 에이전트 학습 시나리오에서 이 프레임워크의 효과를 입증하는 다양한 RL 사용 사례를 제시합니다.

Original Abstract

We introduce OpenTinker, an infrastructure for reinforcement learning (RL) of large language model (LLM) agents built around a separation of concerns across algorithm design, execution, and agent-environment interaction. Rather than relying on monolithic, end-to-end RL pipelines, OpenTinker decomposes agentic learning systems into lightweight, composable components with clearly defined abstraction boundaries. Users specify agents, environments, and interaction protocols, while inference and training are delegated to a managed execution runtime. OpenTinker introduces a centralized scheduler for managing training and inference workloads, including LoRA-based and full-parameter RL, supervised fine-tuning, and inference, over shared resources. We further discuss design principles for extending OpenTinker to multi-agent training. Finally, we present a set of RL use cases that demonstrate the effectiveness of the framework in practical agentic learning scenarios.

0 Citations

0 Influential

1 Altmetric

5.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!