2603.28345v1 Mar 30, 2026 cs.SE

NL/PL 경계를 넘어서: LLM 통합 코드에서 NL/PL 경계를 통한 정보 흐름 분석

Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code

Ruijie Meng

Citations: 474

h-index: 9

Zihao Xu

Citations: 217

h-index: 3

Xiaoyang Cheng

Citations: 24

h-index: 2

Yuekang Li

Citations: 19

h-index: 2

LLM API 호출은 점점 더 보편적인 프로그램 구성 요소가 되었지만, 기존의 어떤 프로그램 분석 방법도 넘볼 수 없는 경계를 생성합니다. 런타임 값은 자연어 프롬프트로 입력되고, LLM 내부에서 불투명한 처리 과정을 거쳐 코드, SQL, JSON 또는 프로그램이 소비하는 텍스트로 다시 나타납니다. 함수 경계를 넘어 데이터를 추적하는 모든 분석 방법, 즉 taint 분석, 프로그램 슬라이싱, 의존성 분석, 변경 영향 분석은 호출 함수의 동작에 대한 데이터 흐름 요약을 필요로 합니다. 하지만 LLM 호출에는 이러한 요약이 존재하지 않아, 우리가 NL/PL 경계라고 부르는 지점에서 이러한 분석 방법들이 모두 중단됩니다. 본 논문에서는 이러한 경계를 극복하는 최초의 정보 흐름 방법을 제시합니다. 양적 정보 흐름 이론에 기반한 본 연구의 분류 체계는 정보 보존 수준(어휘적으로 보존됨에서 완전히 차단됨)과 출력 모드(자연어, 구조화된 형식, 실행 가능한 결과물)라는 두 가지 직교적인 차원을 기준으로 24가지 레이블을 정의합니다. 우리는 4,154개의 실제 Python 파일에서 추출한 9,083개의 플레이스홀더-출력 쌍에 대해 레이블을 지정하고, Cohen's $κ= 0.82$의 신뢰도를 통해 신뢰성을 검증했으며, 거의 완전한 수준(0.01%가 분류 불가능)의 커버리지를 달성했습니다. 우리는 이 분류 체계의 유용성을 두 가지 하위 응용 분야에서 보여줍니다. (1) 분류 체계 기반 필터링과 LLM 검증을 결합한 2단계 taint 전파 파이프라인은 353개의 전문가가 주석을 단 쌍에 대해 $F_1 = 0.923$의 성능을 달성했으며, 실제 OpenClaw 프롬프트 주입 사례 6건에 대한 교차 언어 검증을 통해 효과가 더욱 입증되었습니다. (2) 분류 체계 정보를 활용한 역방향 슬라이싱은 전파되지 않는 플레이스홀더가 포함된 파일에서 평균 15%의 슬라이스 크기 감소를 달성했습니다. 레이블별 분석 결과, 4가지 차단된 레이블이 거의 모든 전파되지 않는 경우를 차지하며, 이는 도구 개발자를 위한 실행 가능한 필터링 기준을 제공합니다.

Original Abstract

LLM API calls are becoming a ubiquitous program construct, yet they create a boundary that no existing program analysis can cross: runtime values enter a natural-language prompt, undergo opaque processing inside the LLM, and re-emerge as code, SQL, JSON, or text that the program consumes. Every analysis that tracks data across function boundaries, including taint analysis, program slicing, dependency analysis, and change-impact analysis, relies on dataflow summaries of callee behavior. LLM calls have no such summaries, breaking all of these analyses at what we call the NL/PL boundary. We present the first information flow method to bridge this boundary. Grounded in quantitative information flow theory, our taxonomy defines 24 labels along two orthogonal dimensions: information preservation level (from lexically preserved to fully blocked) and output modality (natural language, structured format, executable artifact). We label 9,083 placeholder-output pairs from 4,154 real-world Python files and validate reliability with Cohen's $κ= 0.82$ and near-complete coverage (0.01\% unclassifiable). We demonstrate the taxonomy's utility on two downstream applications: (1)~a two-stage taint propagation pipeline combining taxonomy-based filtering with LLM verification achieves $F_1 = 0.923$ on 353 expert-annotated pairs, with cross-language validation on six real-world OpenClaw prompt injection cases further confirming effectiveness; (2)~taxonomy-informed backward slicing reduces slice size by a mean of 15\% in files containing non-propagating placeholders. Per-label analysis reveals that four blocked labels account for nearly all non-propagating cases, providing actionable filtering criteria for tool builders.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!