2603.29232v1 Mar 31, 2026 cs.CL

체인-오브-스트럭처드-쏘트 및 미세 조정된 소형 언어 모델을 활용한 장문 질의응답

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

Yuyu Luo

Citations: 269

h-index: 9

Nan Tang

Citations: 410

h-index: 11

Zhengxuan Zhang

Citations: 29

h-index: 3

Zhuowen Liang

Citations: 8

h-index: 2

Xiaotian Lin

Citations: 64

h-index: 3

Haixun Wang

Citations: 30

h-index: 3

대규모 언어 모델(LLM)은 문서 기반 데이터 분석에 널리 사용되지만, 장문 및 노이즈가 많은 문서에 대한 직접적인 추론은 여전히 불안정하고 오류가 발생하기 쉽습니다. 따라서 본 연구에서는 신뢰성 있고 검증 가능한 질의응답을 지원하기 위해 분산된 증거를 구조화된 출력(예: 표, 그래프 또는 텍스트 조각)으로 통합하는 문서 질의응답(QA)을 연구합니다. 우리는 높은 정확성과 낮은 지연 시간을 작은 언어 모델(SLM)로 달성하기 위한 두 가지 핵심 요소로 구성된 프레임워크인 LiteCoST를 제안합니다. 첫 번째 요소는 체인-오브-스트럭처드-쏘트(CoST)입니다. 우리는 강력한 LLM이 단계별 CoST 추적과 해당 구조화된 출력을 모두 생성하도록 안내하는 스키마 인식 명령인 CoST 템플릿을 도입합니다. 이 프로세스는 최소한의 구조를 유도하고, 개체/단위를 정규화하고, 레코드를 정렬하고, 출력을 직렬화하고, 검증/수정하여 감사 가능한 감독을 제공합니다. 두 번째 요소는 SLM 미세 조정입니다. 이 컴팩트한 모델은 LLM이 생성한 CoST 데이터를 사용하여 두 단계로 학습됩니다. 첫 번째 단계는 구조적 정렬을 위한 지도 미세 조정이고, 두 번째 단계는 답변/서식 품질 및 프로세스 일관성을 위한 삼중 보상을 통합하는 그룹 상대 정책 최적화(GRPO)입니다. 구조를 먼저 학습하는 방식으로 SLM에 지식을 전달함으로써, 본 연구는 3B/7B SLM을 사용하여 다중 도메인의 장문 QA에서 LLM과 비교 가능한 품질을 달성하고, GPT-4o 및 DeepSeek-R1 (671B)보다 2~4배 낮은 지연 시간을 제공합니다. 코드 및 관련 자료는 https://github.com/HKUSTDial/LiteCoST 에서 확인할 수 있습니다.

Original Abstract

Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support reliable, verifiable QA. We propose a two-pillar framework, LiteCoST, to achieve both high accuracy and low latency with small language models (SLMs). Pillar 1: Chain-of-Structured-Thought (CoST). We introduce a CoST template, a schema-aware instruction that guides a strong LLM to produce both a step-wise CoST trace and the corresponding structured output. The process induces a minimal structure, normalizes entities/units, aligns records, serializes the output, and verifies/refines it, yielding auditable supervision. Pillar 2: SLM fine-tuning. The compact models are trained on LLM-generated CoST data in two stages: Supervised Fine-Tuning for structural alignment, followed by Group Relative Policy Optimization (GRPO) incorporating triple rewards for answer/format quality and process consistency. By distilling structure-first behavior into SLMs, this approach achieves LLM-comparable quality on multi-domain long-document QA using 3B/7B SLMs, while delivering 2-4x lower latency than GPT-4o and DeepSeek-R1 (671B). The code is available at https://github.com/HKUSTDial/LiteCoST.

2 Citations

0 Influential

41.594379124341 Altmetric

210.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!