2603.05134v1 Mar 05, 2026 cs.CL

LBM: 추론과 행동을 통한 계층적 대규모 자동 입찰 모델

LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting

Yewen Li

Citations: 208

h-index: 8

Zhiyi Lyu

Citations: 19

h-index: 2

Qingpeng Cai

Citations: 68

h-index: 4

Fei Pan

Citations: 35

h-index: 3

Bo An

Citations: 37

h-index: 3

Peng Jiang

Citations: 54

h-index: 5

온라인 광고 플랫폼에서의 광고 경매 규모가 증가함에 따라 경쟁이 심화되었고, 수동 입찰 방식은 비현실적이 되었습니다. 따라서, 광고주가 경제적 목표를 달성하도록 돕기 위해 자동 입찰 시스템이 필요하게 되었습니다. 현재의 자동 입찰 방법은 입찰 전략을 최적화하기 위해 오프라인 강화 학습 또는 생성 모델을 사용하지만, 블랙박스 학습 방식과 제한적인 데이터셋으로 인해 때로는 비합리적인 결과를 초래하며, 동적인 광고 환경에서 작업 상태를 이해하고 일반화하는 데 어려움을 겪습니다. 대규모 언어 모델(LLM)은 기존의 인간 지식과 추론 능력을 활용하여 자동 입찰 성능을 향상시키는 유망한 솔루션을 제공합니다. 그러나, LLM을 직접 자동 입찰에 적용하는 것은 경쟁적인 경매에서 정확한 행동이 필요하고, 전문적인 자동 입찰 지식이 부족하여 환각 현상과 최적 이하의 의사 결정을 초래할 수 있습니다. 이러한 문제점을 해결하기 위해, 우리는 추론 능력을 활용하여 우수한 자동 입찰 전략을 개발하는 계층적 대규모 자동 입찰 모델(LBM)을 제안합니다. 여기에는 추론을 위한 상위 레벨의 LBM-Think 모델과 행동 생성을 위한 하위 레벨의 LBM-Act 모델이 포함됩니다. 특히, 우리는 언어 가이드 학습을 위한 LBM-Act의 효율적인 양방향 통합을 위해 이중 임베딩 메커니즘을 제안합니다. 또한, 시뮬레이션이나 실제 환경 테스트 없이 LLM-Think의 환각 현상을 완화하고 의사 결정 성능을 향상시키는 오프라인 강화 학습 미세 조정 기법인 GQPO를 제안합니다. 실험 결과는 LBM을 기반으로 하는 생성 모델이 특히 효율적인 학습 방식과 일반화 능력 측면에서 우수함을 보여줍니다.

Original Abstract

The growing scale of ad auctions on online advertising platforms has intensified competition, making manual bidding impractical and necessitating auto-bidding to help advertisers achieve their economic goals. Current auto-bidding methods have evolved to use offline reinforcement learning or generative methods to optimize bidding strategies, but they can sometimes behave counterintuitively due to the black-box training manner and limited mode coverage of datasets, leading to challenges in understanding task status and generalization in dynamic ad environments. Large language models (LLMs) offer a promising solution by leveraging prior human knowledge and reasoning abilities to improve auto-bidding performance. However, directly applying LLMs to auto-bidding faces difficulties due to the need for precise actions in competitive auctions and the lack of specialized auto-bidding knowledge, which can lead to hallucinations and suboptimal decisions. To address these challenges, we propose a hierarchical Large autoBidding Model (LBM) to leverage the reasoning capabilities of LLMs for developing a superior auto-bidding strategy. This includes a high-level LBM-Think model for reasoning and a low-level LBM-Act model for action generation. Specifically, we propose a dual embedding mechanism to efficiently fuse two modalities, including language and numerical inputs, for language-guided training of the LBM-Act; then, we propose an offline reinforcement fine-tuning technique termed GQPO for mitigating the LLM-Think's hallucinations and enhancing decision-making performance without simulation or real-world rollout like previous multi-turn LLM-based methods. Experiments demonstrate the superiority of a generative backbone based on our LBM, especially in an efficient training manner and generalization ability.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!