2601.12641v1 Jan 19, 2026 cs.AI

STEP-LLM: 대규모 언어 모델을 이용한 자연어 기반 CAD STEP 모델 생성

STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models

Xiangyu Shi

Citations: 7

h-index: 2

Xu Zhao

Citations: 8

h-index: 2

Payal Mohapatra

Citations: 147

h-index: 8

Daniel Quispe

Citations: 76

h-index: 1

Kojo Welbeck

Citations: 4

h-index: 1

Jian Cao

Citations: 1

h-index: 1

Wei Chen

Citations: 82

h-index: 2

S. Zhan

Citations: 82

h-index: 4

P. Guo

Citations: 17

h-index: 2

Qi Zhu

Citations: 47

h-index: 4

Junyan Ding

Citations: 3

h-index: 1

컴퓨터 지원 설계(CAD)는 현대 제조업에 필수적이지만, 모델 생성 과정은 여전히 많은 노동력과 전문 지식을 필요로 합니다. 비전문가도 직관적인 설계 의도를 제조 가능한 결과물로 변환할 수 있도록 하기 위해, 최근 대규모 언어 모델 기반의 Text-to-CAD 연구들은 명령 시퀀스나 CadQuery와 같은 스크립트 기반 형식에 집중하고 있습니다. 그러나 이러한 형식들은 특정 커널에 의존적이며 제조 분야에서의 범용성이 부족합니다. 반면, 제품 데이터 교환 표준(STEP, ISO 10303) 파일은 제조 공정과 직접 호환되는 널리 채택된 중립적 경계 표현(B-rep) 형식이지만, 그래프 구조와 상호 참조적인 특성으로 인해 자동 회귀(auto-regressive) LLM에 적용하기에는 고유한 어려움이 있습니다. 이를 해결하기 위해 우리는 약 4만 개의 STEP-캡션 쌍으로 구성된 데이터셋을 구축하고, STEP의 그래프 구조 형식에 맞춤화된 새로운 전처리 기술을 도입했습니다. 여기에는 국소성을 보존하면서 상호 참조를 선형화하는 깊이 우선 탐색(DFS) 기반 재직렬화와 전체적인 일관성을 유도하는 생각의 사슬(CoT) 스타일의 구조적 주석이 포함됩니다. 우리는 검색 증강 생성(RAG)을 통합하여 지도 미세 조정 시 관련 예시를 바탕으로 예측을 수행하도록 하고, 챔퍼 거리(Chamfer Distance) 기반의 기하학적 보상을 사용하는 강화 학습을 통해 생성 품질을 개선했습니다. 실험 결과, STEP-LLM은 Text2CAD 베이스라인 대비 기하학적 충실도에서 일관된 향상을 보였으며, 이러한 개선은 프레임워크의 여러 단계에서 비롯되었습니다. RAG 모듈은 완전성과 렌더링 가능성을 크게 향상시켰고, DFS 기반 재직렬화는 전반적인 정확도를 높였으며, 강화 학습은 기하학적 불일치를 더욱 감소시켰습니다. 정량적 지표와 시각적 비교 모두에서 STEP-LLM이 Text2CAD보다 더 높은 충실도의 형상을 생성함을 확인했습니다. 이러한 결과는 자연어로부터 LLM을 활용한 STEP 모델 생성의 타당성을 보여주며, 제조용 CAD 설계의 대중화 가능성을 시사합니다.

Original Abstract

Computer-aided design (CAD) is vital to modern manufacturing, yet model creation remains labor-intensive and expertise-heavy. To enable non-experts to translate intuitive design intent into manufacturable artifacts, recent large language models-based text-to-CAD efforts focus on command sequences or script-based formats like CadQuery. However, these formats are kernel-dependent and lack universality for manufacturing. In contrast, the Standard for the Exchange of Product Data (STEP, ISO 10303) file is a widely adopted, neutral boundary representation (B-rep) format directly compatible with manufacturing, but its graph-structured, cross-referenced nature poses unique challenges for auto-regressive LLMs. To address this, we curate a dataset of ~40K STEP-caption pairs and introduce novel preprocessing tailored for the graph-structured format of STEP, including a depth-first search-based reserialization that linearizes cross-references while preserving locality and chain-of-thought(CoT)-style structural annotations that guide global coherence. We integrate retrieval-augmented generation to ground predictions in relevant examples for supervised fine-tuning, and refine generation quality through reinforcement learning with a specific Chamfer Distance-based geometric reward. Experiments demonstrate consistent gains of our STEP-LLM in geometric fidelity over the Text2CAD baseline, with improvements arising from multiple stages of our framework: the RAG module substantially enhances completeness and renderability, the DFS-based reserialization strengthens overall accuracy, and the RL further reduces geometric discrepancy. Both metrics and visual comparisons confirm that STEP-LLM generates shapes with higher fidelity than Text2CAD. These results show the feasibility of LLM-driven STEP model generation from natural language, showing its potential to democratize CAD design for manufacturing.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!