2604.23701v1 Apr 26, 2026 cs.CL

Agri-CPJ: 캡션-프롬프트-판별 시스템 및 LLM-기반 판별기를 활용한 농작물 병해충 진단을 위한 학습 불필요한 설명 가능한 프레임워크

Agri-CPJ: A Training-Free Explainable Framework for Agricultural Pest Diagnosis Using Caption-Prompt-Judge and LLM-as-a-Judge

Mukang You

Citations: 2

h-index: 1

Wentao Zhang

Citations: 3

h-index: 1

Qi Zhang

Citations: 2

h-index: 1

Henghua Shen

Citations: 2

h-index: 1

Zhongzhi He

Citations: 67

h-index: 4

Keyan Jin

Citations: 42

h-index: 4

Derek F. Wong

Citations: 3

h-index: 1

Tao Fang

Citations: 28

h-index: 2

Mingkun Xu

Citations: 9

h-index: 2

현장 사진을 이용한 작물 질병 진단은 다음과 같은 두 가지 문제에 직면합니다. 성능이 우수한 모델들이 종종 실제 종 이름을 잘못 식별하는 현상이 발생하며, 예측이 정확하더라도 그 이유를 전문가가 이해하기 어려울 수 있습니다. 본 논문에서는 Agri-CPJ (Caption-Prompt-Judge)라는 학습이 필요 없는 소량 데이터 기반 프레임워크를 소개합니다. 이 프레임워크는 대규모 시각-언어 모델이 먼저 구조화된 형태학적 설명을 생성하며, 다차원 품질 게이팅을 통해 반복적으로 개선된 후 진단 질문에 대한 답변을 제공합니다. 두 가지 후보 응답은 상호 보완적인 관점에서 생성되며, LLM 판별기가 도메인 특화된 기준에 따라 더 나은 응답을 선택합니다. 형태학적 설명 개선은 전체 성능에 가장 큰 영향을 미치는 요소이며, 이 단계를 생략하면 테스트된 모든 모델에서 다운스트림 정확도가 일관되게 저하됩니다. CDDMBench에서 GPT-5-Nano와 GPT-5-mini가 생성한 설명을 함께 사용했을 때, 질병 분류 정확도가 22.7%p, 질의응답 점수가 19.5%p 향상되어, 형태학적 설명을 사용하지 않은 모델보다 성능이 우수했습니다. AgMMU-MCQs 데이터셋에 수정 없이 적용한 결과, GPT-5-Nano는 77.84%의 정확도를, Qwen-VL-Chat은 64.54%의 정확도를 달성하여, 비슷한 규모의 대부분 오픈 소스 모델과 동등하거나 그 이상의 성능을 보였습니다. 구조화된 설명과 판별기의 근거는 함께 읽을 수 있는 감사 추적을 구성하며, 진단에 동의하지 않는 사용자는 잘못된 특정 설명을 식별할 수 있습니다. 코드 및 데이터는 다음 링크에서 공개적으로 이용 가능합니다: https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis

Original Abstract

Crop disease diagnosis from field photographs faces two recurring problems: models that score well on benchmarks frequently hallucinate species names, and when predictions are correct, the reasoning behind them is typically inaccessible to the practitioner. This paper describes Agri-CPJ (Caption-Prompt-Judge), a training-free few-shot framework in which a large vision-language model first generates a structured morphological caption, iteratively refined through multi-dimensional quality gating, before any diagnostic question is answered. Two candidate responses are then generated from complementary viewpoints, and an LLM judge selects the stronger one based on domain-specific criteria. Caption refinement is the component with the largest individual impact: ablations confirm that skipping it consistently degrades downstream accuracy across both models tested. On CDDMBench, pairing GPT-5-Nano with GPT-5-mini-generated captions yields \textbf{+22.7} pp in disease classification and \textbf{+19.5} points in QA score over no-caption baselines. Evaluated without modification on AgMMU-MCQs, GPT-5-Nano reached 77.84\% and Qwen-VL-Chat reached 64.54\%, placing them at or above most open-source models of comparable scale despite the format shift from open-ended to multiple-choice. The structured caption and judge rationale together constitute a readable audit trail: a practitioner who disagrees with a diagnosis can identify the specific caption observation that was incorrect. Code and data are publicly available https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis

1 Citations

0 Influential

28.931471805599 Altmetric

145.7 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!