2604.04997v1 Apr 05, 2026 cs.IR

LLM 기반 문서 분류를 위한 임베딩 기반 방법 및 생성 모델 평가: 기회와 과제

Evaluation of Embedding-Based and Generative Methods for LLM-Driven Document Classification: Opportunities and Challenges

Hao Liu

Citations: 30

h-index: 2

Rong Lu

Citations: 6

h-index: 2

S. Hou

Citations: 4

h-index: 1

본 연구는 지질학 기술 문서를 분류하기 위한 임베딩 기반 모델과 생성 모델에 대한 비교 분석을 제시합니다. 다학제적 벤치마크 데이터 세트를 사용하여 모델 정확도, 안정성 및 계산 비용 간의 균형을 평가했습니다. 연구 결과, Chain-of-Thought (CoT) 프롬프팅을 통해 강화된 생성형 비전-언어 모델(VLMs)인 Qwen2.5-VL이 82%의 우수한 제로샷 정확도를 달성하는 반면, 최첨단 멀티모달 임베딩 모델인 QQMM은 63%의 정확도를 보이는 것으로 나타났습니다. 또한, 지도 학습 기반 미세 조정(SFT)이 VLM의 성능을 향상시킬 수 있지만, 학습 데이터 불균형에 민감하다는 것을 입증했습니다.

Original Abstract

This work presents a comparative analysis of embedding-based and generative models for classifying geoscience technical documents. Using a multi-disciplinary benchmark dataset, we evaluated the trade-offs between model accuracy, stability, and computational cost. We find that generative Vision-Language Models (VLMs) like Qwen2.5-VL, enhanced with Chain-of-Thought (CoT) prompting, achieve superior zero-shot accuracy (82%) compared to state-of-the-art multimodal embedding models like QQMM (63%). We also demonstrate that while supervised fine-tuning (SFT) can improve VLM performance, it is sensitive to training data imbalance.

1 Citations

0 Influential

1 Altmetric

6.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!