2602.13367v1 Feb 13, 2026 cs.AI

Nanbeige4.1-3B: 추론, 정렬 및 행동 능력을 갖춘 소형 범용 모델

Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

Cheng Yang

Citations: 135

h-index: 5

Yang Song

Citations: 2,628

h-index: 7

Guangyue Peng

Citations: 4

h-index: 1

Jiaying Zhu

Citations: 9

h-index: 2

Ran Le

Citations: 13

h-index: 3

Ruixiang Feng

Citations: 16

h-index: 3

Tao Zhang

Citations: 61

h-index: 3

Xiyun Xu

Citations: 5

h-index: 2

Yiming Jia

Citations: 7

h-index: 2

Yuntao Wen

Citations: 42

h-index: 3

Yun Xu

Citations: 260

h-index: 4

Zekai Wang

Citations: 8

h-index: 2

Zhenwei An

Citations: 11

h-index: 3

Zhicong Sun

Citations: 3

h-index: 1

Zongchao Chen

Citations: 12

h-index: 3

본 논문에서는 30억 개의 파라미터로 강력한 에이전트 기능, 코드 생성 및 일반적인 추론 능력을 동시에 달성하는 통합형 범용 언어 모델인 Nanbeige4.1-3B를 소개합니다. 현재까지 알려진 바에 따르면, Nanbeige4.1-3B는 단일 모델에서 이러한 다재다능함을 보이는 최초의 오픈 소스 소형 언어 모델(SLM)입니다. 추론 능력과 선호도 정렬을 개선하기 위해, 점별 및 쌍별 보상 모델링을 결합하여 고품질의, 인간과 일관된 응답을 보장합니다. 코드 생성을 위해 강화 학습에서 복잡성을 고려한 보상을 설계하여 정확성과 효율성을 모두 최적화합니다. 심층 검색에서는 복잡한 데이터 합성 및 턴 단위 감독 학습을 수행하여 안정적인 장기 도구 상호 작용을 가능하게 합니다. 이를 통해 Nanbeige4.1-3B는 복잡한 문제 해결을 위해 최대 600번의 도구 호출을 안정적으로 수행할 수 있습니다. 광범위한 실험 결과는 Nanbeige4.1-3B가 Nanbeige4-3B-2511 및 Qwen3-4B와 같은 유사한 규모의 기존 모델보다 훨씬 뛰어난 성능을 보이며, 훨씬 더 큰 모델인 Qwen3-30B-A3B와 비교해도 우수한 성능을 달성한다는 것을 보여줍니다. 이러한 결과는 소형 모델이 광범위한 역량과 강력한 전문성을 동시에 달성할 수 있음을 보여주며, 30억 개 파라미터 모델의 잠재력을 재정의합니다.

Original Abstract

We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!