2604.26805v1 Apr 29, 2026 cs.AI

Bian Que: 온라인 시스템 운영을 위한 유연한 기술 배치를 갖춘 에이전트 프레임워크

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

Ming Li

Citations: 0

h-index: 0

Yang Zhao

Citations: 2

h-index: 1

Zhipeng Qian

Citations: 62

h-index: 4

Zihan Liang

Citations: 41

h-index: 4

Yufei Ma

Citations: 77

h-index: 5

Chenyi Lei

Citations: 134

h-index: 5

Jun Zhuang

Citations: 9

h-index: 1

Shuo Yang

Citations: 11

h-index: 2

Hong Wan

Citations: 66

h-index: 2

Yaoliang Wu

Citations: 63

h-index: 1

대규모 온라인 엔진 시스템(검색, 추천, 광고)의 운영 및 유지 보수(O&M)는 릴리스 모니터링, 알림 대응 및 근본 원인 분석을 위해 상당한 인적 노력을 필요로 합니다. LLM 기반 에이전트는 이러한 작업에 적합하지만, 배포상의 병목 지점은 추론 능력보다는 조정에 있습니다. 즉, 각 운영 이벤트에 대해 관련 데이터(메트릭, 로그, 변경 이벤트)와 적용 가능한 운영 지식(핸드북 규칙 및 실무 경험)을 선택하는 것입니다. 모든 신호를 무분별하게 제공하면 신뢰도가 떨어지고 환각 현상이 발생하며, 이벤트와 (데이터, 지식) 매핑을 수동으로 큐레이션하는 것은 하루에 수십 건의 릴리스가 발생하는 상황에서 불가능합니다. 본 논문에서는 세 가지 기여를 하는 에이전트 프레임워크인 Bian Que를 제시합니다. (i) extit{통합된 운영 패러다임}: 일상적인 O&M을 세 가지 표준 패턴(릴리스 차단, 사전 검사 및 알림 근본 원인 분석)으로 추상화합니다. (ii) extit{유연한 기술 배치}: 각 기술은 특정 비즈니스 모듈 컨텍스트에 대해 어떤 데이터와 지식을 검색해야 하는지 지정하며, LLM에 의해 자동으로 생성 및 업데이트되거나, 당직 엔지니어의 자연어 지침을 통해 반복적으로 개선될 수 있습니다. (iii) extit{통합된 자체 진화 메커니즘}: 하나의 수정 신호가 두 가지 병렬 경로를 구동합니다. 즉, 사례 메모리에서 지식 추출 및 특정 기술 개선입니다. 중국 최대의 짧은 동영상 플랫폼인 KuaiShou의 전자 상거래 검색 엔진에 Bian Que를 배포한 결과, 알림 발생량을 75% 줄이고, 근본 원인 분석 정확도를 80% 달성했으며, 평균 해결 시간을 50% 이상 단축했습니다. 저희 프레임워크는 오프라인 평가에서 99.0%의 합격률을 보였습니다. 저희 코드는 https://github.com/benchen4395/BianQue_Assistant 에서 확인할 수 있습니다.

Original Abstract

Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but orchestration: selecting, for each operational event, the relevant data (metrics, logs, change events) and the applicable operational knowledge (handbook rules and practitioner experience). Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases. We present Bian Que, an agentic framework with three contributions: (i) a \emph{unified operational paradigm} abstracting day-to-day O&M into three canonical patterns: release interception, proactive inspection, and alert root cause analysis; (ii) \emph{Flexible Skill Arrangement}, where each Skill specifies which data and knowledge to retrieve for a given business-module context and can be automatically generated and updated by LLMs or iteratively refined through natural-language instructions from on-call engineers; (iii) a \emph{unified self-evolving mechanism} in which one correction signal drives two parallel pathways, case-memory-to-knowledge distillation and targeted Skill refinement. Deployed on the e-commerce search engine of KuaiShou, the major short-video platform in China, Bian Que reduces alert volume by 75%, achieves 80% root-cause analysis accuracy, and cuts mean time to resolution by over 50%. Our framework achieves 99.0% pass rate on offline evaluations. Our code is available at https://github.com/benchen4395/BianQue_Assistant.

0 Citations

0 Influential

22.5 Altmetric

112.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!