2605.14857v1 May 14, 2026 cs.AI

HS 관세 분류를 위한 결정론적 에이전트 기반 워크플로우: 해석 가능한 의사 결정을 통한 다차원 규칙 추론

A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions

Zhenglin Huang

Citations: 218

h-index: 6

Yu Zhang

Citations: 275

h-index: 4

Dongjiang Zhuang

Citations: 0

h-index: 0

Qu Zhou

Citations: 23

h-index: 2

Jun-Wei Wu

Citations: 5

h-index: 1

Jing Cao

Citations: 8

h-index: 2

Kai Chen

Citations: 16

h-index: 3

조화체계(HS) 관세 분류는 전문적인 수준의 고위험 작업으로, 자유 형식의 제품 설명을 일반 해석 규칙(GIR), 섹션 노트, 챕터 노트 및 설명 노트를 기반으로 특정 6자리 또는 8자리 코드로 매핑해야 합니다. 어려움은 지식의 양에 있는 것이 아니라 *다차원 규칙 추론*에 있습니다. 올바른 분류는 재료, 형태, 기능, 본질적 특징, 부분-전체 경계, 특정 목록 대 잔여 항목 등 여러 측면에서 동시에 충돌하는 우선순위 규칙을 만족해야 합니다. 대규모 언어 모델의 엔드투엔드 프롬프트 방식은 일반적으로 한 측면을 해결하는 동시에 다른 측면의 우선순위 제약을 무시하여 실패합니다. 본 연구에서는 자체 계획 에이전트와 대조되는 *결정론적 에이전트 기반 워크플로우*를 제시합니다. 이 워크플로우는 제어 흐름이 고정되어 있으며, 언어 모델 호출은 제한된 단계로 이루어지고, 반성 및 검증은 로컬 메커니즘으로 유지됩니다. 이러한 설계는 해석 가능성을 보장합니다. 각 결정은 단계별로 구조화된 출력으로 분해되며, 관련된 챕터 또는 섹션 노트의 정확한 인용이 포함됩니다. 이 아키텍처는 중국 HS 관세에 대한 오프라인 지식 엔지니어링과 6단계 온라인 파이프라인을 결합합니다. HSCodeComp 데이터셋에서 6자리 수준으로 평가한 결과, Qwen3.6-plus 모델을 사용하여 75.0%의 top-1 정확도와 91.5%의 top-3 정확도를 달성했습니다 (4자리 기준). 또한 6자리 기준으로는 64.2%의 top-1 정확도와 78.3%의 top-3 정확도를 달성했습니다. 오픈 가중치인 Qwen3.6-27B-FP8 모델 (사고 모드 비활성화)은 최첨단 모델과 비교하여 4자리 기준 84.2%, 6자리 기준 77.4%의 top-1 일치율을 보였습니다. 226개의 6자리 불일치 항목에 대한 2단계 수동 감사를 통해 HSCodeComp 데이터셋의 ground-truth 레이블 중 상당수가 HS 일반 규칙에서 벗어날 가능성이 있음을 확인했습니다. 전체 심의 기록은 부록에 공개되며, 커뮤니티 검토를 위한 예비 결과로 제공됩니다.

Original Abstract

Harmonized System (HS) tariff classification is a high-stakes, expert-level task in which a free-form product description must be mapped to a specific six- or eight-digit code under the General Interpretive Rules (GIR), section notes, chapter notes, and Explanatory Notes. The difficulty lies not in knowledge volume but in *multi-dimensional rule reasoning*: a correct classification must satisfy competing priority rules along several axes simultaneously, including material, form, function, essential character, the part-versus-whole boundary, and specific listing versus residual headings. End-to-end prompting of large language models fails characteristically by resolving one axis while ignoring the priority constraints on the others. We present a *deterministic agentic workflow* in contrast to self-planning agents: the control flow is fixed, language model calls are confined to narrow stages, and reflection and verification are retained as local mechanisms. This design yields interpretability by construction--each decision is decomposed into stage-wise structured outputs with verbatim citation of the chapter or section notes that bear on it. The architecture combines offline knowledge-engineering of the Chinese HS tariff with an online six-stage pipeline. Evaluated on HSCodeComp at the six-digit level, the workflow reaches 75.0% top-1 and 91.5% top-3 at four digits, and 64.2% top-1 and 78.3% top-3 at six digits with Qwen3.6-plus; an open-weight Qwen3.6-27B-FP8 backbone in non-thinking mode achieves 84.2% four-digit and 77.4% six-digit top-1 agreement with the frontier model. A two-stage manual audit of 226 six-digit disagreements suggests that a non-trivial fraction of HSCodeComp ground-truth labels may deviate from HS general rules; full adjudication records are released in the appendix as preliminary findings for community review.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!