2603.14342v1 Mar 15, 2026 cs.CV

AgroNVILA: 다중 시점 농업 다중 모드 대규모 언어 모델을 위한 인식-추론 분리

AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models

Yutong Lu

Citations: 1,074

h-index: 8

Juepeng Zheng

Citations: 1,643

h-index: 21

Jiarui Zhang

Citations: 340

h-index: 10

Junqi Hu

Citations: 34

h-index: 3

Zurong Mai

Citations: 5

h-index: 1

Yuhang Chen

Citations: 5

h-index: 1

Shuohong Lou

Citations: 5

h-index: 1

Henglian Huang

Citations: 21

h-index: 2

Lingyuan Zhao

Citations: 5

h-index: 1

Jianxi Huang

Citations: 43

h-index: 4

Haohuan Fu

Citations: 737

h-index: 15

농업 다중 모드 추론은 다양한 규모의 공간적 이해를 필요로 하며, 여기에는 지상에서의 근접 촬영부터 드론 및 위성 이미지를 이용한 상위 관점까지 포함됩니다. 기존의 다중 모드 대규모 언어 모델(MLLM)은 상당한 "지상 중심" 편향을 가지고 있어, 복잡한 농업 계획 과정에서 규모 혼동 및 논리적 오류를 야기합니다. 이러한 문제를 해결하기 위해, 우리는 현대 정밀 농업에서 다양한 공간적 토폴로지와 규모를 포착하도록 설계된 최초의 대규모 데이터셋인 AgroOmni(288K)를 소개합니다. 이 데이터셋을 기반으로, 우리는 새로운 인식-추론 분리(PRD) 아키텍처를 활용하는 MLLM인 AgroNVILA를 제안합니다. 인식 측면에서, 우리는 시점 조건부 메타 네트워크(VCMN)를 통합하여 시각적 토큰에 거시적인 공간적 맥락을 주입함으로써, 최소한의 계산 비용으로 규모의 모호성을 해소합니다. 추론 측면에서, 농업 지식을 활용한 상대적 정책 최적화(ARPO)는 강화 학습을 활용하여 모델의 의사 결정 과정을 전문가의 농업 논리에 맞추어, 통계적 편향을 방지합니다. 광범위한 실험 결과, AgroNVILA는 최첨단 MLLM보다 우수한 성능을 보이며, 다양한 고도에서의 농업 추론 능력에서 상당한 향상(+15.18%)을 보여주어, 종합적인 농업 공간 계획 능력의 견고함을 입증합니다.

Original Abstract

Agricultural multimodal reasoning requires robust spatial understanding across varying scales, from ground-level close-ups to top-down UAV and satellite imagery. Existing Multi-modal Large Language Models (MLLMs) suffer from a significant "terrestrial-centric" bias, causing scale confusion and logic drift during complex agricultural planning. To address this, we introduce the first large-scale AgroOmni (288K), a multi-view training corpus designed to capture diverse spatial topologies and scales in modern precision agriculture. Built on this dataset, we propose AgroNVILA, an MLLM that utilizes a novel Perception-Reasoning Decoupling (PRD) architecture. On the perception side, we incorporate a View-Conditioned Meta-Net (VCMN), which injects macroscopic spatial context into visual tokens, resolving scale ambiguities with minimal computational overhead. On the reasoning side, Agriculture-aware Relative Policy Optimization (ARPO) leverages reinforcement learning to align the model's decision-making with expert agricultural logic, preventing statistical shortcuts. Extensive experiments demonstrate that AgroNVILA outperforms state-of-the-art MLLMs, achieving significant improvements (+15.18%) in multi-altitude agricultural reasoning, reflecting its robust capability for holistic agricultural spatial planning.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!