2601.05467v3 Jan 09, 2026 cs.SE

STELP: LLM 생성 프로그램의 안전한 변환 및 실행

STELP: Secure Transpilation and Execution of LLM-Generated Programs

Sahil Wadhwa

Citations: 155

h-index: 7

Swapnil Shinde

Citations: 13

h-index: 1

Andy Luo

Citations: 60

h-index: 3

Akshay Gupta

Citations: 13

h-index: 1

M. Sorower

Citations: 32

h-index: 2

대규모 언어 모델(LLM)의 빠른 발전은 추론, 계획 및 함수 호출 능력에서 상당한 발전을 이루었습니다. 이러한 LLM을 사용하는 다중 에이전트 협업 프레임워크는 코드 생성과 같은 소프트웨어 개발 관련 작업을 해결하는 데 중요한 역할을 합니다. 그러나 LLM이 생성한 코드를 직접적으로 프로덕션 소프트웨어 개발 시스템에 사용하는 것은 문제가 있습니다. 해당 코드는 불안정하거나 오류를 포함할 수 있으며, 데이터 오염, 악성 공격, 환각 등의 취약점을 포함하여 시스템 전체의 오작동을 초래할 수 있습니다. 이러한 문제점은 인간의 코드 검토 및 기존의 안전 테스트 도구가 비실용적이거나 신뢰할 수 없는 프로덕션 AI 시스템에서 LLM이 생성한 코드를 사용하는 것을 어렵게 만듭니다. 본 논문에서는 LLM이 생성한 코드 실행 시 발생하는 안전 및 신뢰성 문제를 논의하고, LLM이 생성한 프로그램을 안전하게 실행할 수 있는 Secure Transpiler and Executor of LLM-Generated Program (STELP)을 제안합니다. STELP은 코드 생성을 포함하는 자율적인 프로덕션 AI 시스템을 보호하며, 기존의 안전 테스트 방법론 및 인간의 감독의 비실용성 또는 한계를 극복하는 중요한 역할을 합니다. 이는 헤드리스 코드 생성-실행 및 실시간으로 실행될 실행 가능한 코드 스니펫을 생성하는 LLM과 같은 애플리케이션에 적용될 수 있습니다. 본 연구에서는 인간에 의해 검증된 보안 취약 코드 스니펫 데이터셋을 구축하고, 공개적으로 사용 가능한 데이터셋을 사용하여 제안하는 방법의 정확성, 안전성 및 지연 시간을 평가합니다. 실험 결과는 제안하는 방법이 기존 방법보다 현저하게 우수한 성능을 보이며, 특히 위험한 코드 스니펫을 안전하게 실행하는 능력에서 뛰어난 성능을 보여줍니다. 주의: 본 논문에는 악성 코드 스니펫이 포함되어 있으므로, 실행 시 주의가 필요합니다.

Original Abstract

Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs place them at the center of solving software development-related tasks such as code generation. However, direct use of LLM generated code in production software development systems is problematic. The code could be unstable or erroneous and contain vulnerabilities such as data poisoning, malicious attacks, and hallucinations that could lead to widespread system malfunctions. This prohibits the adoption of LLM generated code in production AI systems where human code reviews and traditional secure testing tools are impractical or untrustworthy. In this paper, we discuss safety and reliability problems with the execution of LLM generated code and propose a Secure Transpiler and Executor of LLM-Generated Program (STELP), capable of executing LLM-generated code in a controlled and safe manner. STELP secures autonomous production AI systems involving code generation, filling the critical void left by the impracticality or limitations of traditional secure testing methodologies and human oversight. This includes applications such as headless code generation-execution and LLMs that produce executable code snippets as an action plan to be executed in real time. We contribute a human-validated dataset of insecure code snippets and benchmark our approach on publicly available datasets for correctness, safety, and latency. Our results demonstrate that our approach outperforms an existing method by a significant margin, particularly in its ability to safely execute risky code snippets. Warning: This paper contains malicious code snippets that should be run with caution.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!