2603.12091v1 Mar 12, 2026 cs.LG

피드백 메모리를 활용한 자원 효율적인 반복 LLM 기반 신경망 구조 탐색

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

R. Timofte

Citations: 57,078

h-index: 101

D. Ignatov

Citations: 74

h-index: 3

Xiaojie Guo

Citations: 2

h-index: 1

신경망 구조 탐색(NAS)은 네트워크 설계를 자동화하지만, 기존 방법은 상당한 계산 자원을 필요로 합니다. 본 연구에서는 대규모 언어 모델(LLM)을 활용하여 반복적으로 컨볼루션 신경망 아키텍처를 생성, 평가 및 개선하는 폐쇄 루프 파이프라인을 제안합니다. 이는 단일 소비자용 GPU에서 LLM 미세 조정 없이 이미지 분류 작업을 수행합니다. 핵심적인 접근 방식은 마르코프 체인에서 영감을 받은 과거 피드백 메모리입니다. $K{=}5$의 최근 개선 시도들을 포함하는 슬라이딩 윈도우를 사용하여 컨텍스트 크기를 일정하게 유지하면서 반복 학습에 필요한 충분한 정보를 제공합니다. 기존의 LLM 최적화 방법과는 달리, 실패 경로를 버리는 대신, 각 기록은 식별된 문제, 제안된 수정 사항 및 결과와 같은 구조화된 진단 트리플로 구성되어 코드 실행 실패를 중요한 학습 신호로 간주합니다. 이중 LLM 전문화는 각 호출 시의 인지 부담을 줄입니다. 코드 생성기는 실행 가능한 PyTorch 아키텍처를 생성하고, 프롬프트 개선기는 진단 추론을 수행합니다. LLM과 아키텍처 훈련 모두 제한된 VRAM을 공유하므로, 검색 과정은 암묵적으로 엣지 배포에 적합한 작고 하드웨어 효율적인 모델을 선호합니다. 본 연구에서는 최대 2000번의 반복 동안 제약 없는 개방형 코드 공간에서 세 가지 고정된 명령 튜닝 LLM(${ extless}7$B 파라미터)을 평가했습니다. CIFAR-10, CIFAR-100 및 ImageNette 데이터셋에 대한 단일 에폭의 프록시 정확도를 사용하여 빠른 순위 정보를 얻었습니다. CIFAR-10 데이터셋에서 DeepSeek-Coder-6.7B 모델은 정확도가 28.2%에서 69.2%로 향상되었고, Qwen2.5-7B 모델은 50.0%에서 71.5%로, GLM-5 모델은 43.2%에서 62.0%로 향상되었습니다. 전체 2000번의 반복 검색은 단일 RTX~4090 GPU에서 약 18시간 내에 완료되었으며, 이는 클라우드 인프라 없이 LLM 기반 NAS를 위한 저렴하고 재현 가능하며 하드웨어 인식적인 패러다임을 제시합니다.

Original Abstract

Neural Architecture Search (NAS) automates network design, but conventional methods demand substantial computational resources. We propose a closed-loop pipeline leveraging large language models (LLMs) to iteratively generate, evaluate, and refine convolutional neural network architectures for image classification on a single consumer-grade GPU without LLM fine-tuning. Central to our approach is a historical feedback memory inspired by Markov chains: a sliding window of $K{=}5$ recent improvement attempts keeps context size constant while providing sufficient signal for iterative learning. Unlike prior LLM optimizers that discard failure trajectories, each history entry is a structured diagnostic triple -- recording the identified problem, suggested modification, and resulting outcome -- treating code execution failures as first-class learning signals. A dual-LLM specialization reduces per-call cognitive load: a Code Generator produces executable PyTorch architectures while a Prompt Improver handles diagnostic reasoning. Since both the LLM and architecture training share limited VRAM, the search implicitly favors compact, hardware-efficient models suited to edge deployment. We evaluate three frozen instruction-tuned LLMs (${\leq}7$B parameters) across up to 2000 iterations in an unconstrained open code space, using one-epoch proxy accuracy on CIFAR-10, CIFAR-100, and ImageNette as a fast ranking signal. On CIFAR-10, DeepSeek-Coder-6.7B improves from 28.2% to 69.2%, Qwen2.5-7B from 50.0% to 71.5%, and GLM-5 from 43.2% to 62.0%. A full 2000-iteration search completes in ${\approx}18$ GPU hours on a single RTX~4090, establishing a low-budget, reproducible, and hardware-aware paradigm for LLM-driven NAS without cloud infrastructure.

7 Citations

0 Influential

30 Altmetric

157.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!