2605.06601v1 May 07, 2026 cs.CR

Patch2Vuln: 리눅스 배포판 바이너리 패치로부터 취약점을 추론하는 에이전트 기반 재구성 방법

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Citations: 70

h-index: 5

Citations: 62

h-index: 4

보안 업데이트는 방어자와 공격자가 취약한 소프트웨어와 패치된 소프트웨어를 비교할 수 있는 짧지만 중요한 기간을 제공합니다. 그러나 많은 운영 환경에서 소스 패치 또는 공지 텍스트보다 바이너리 패키지가 가장 쉽게 접근할 수 있는 자료입니다. 본 논문에서는 로컬 바이너리 기반 증거에만 제한된 언어 모델 에이전트가 리눅스 배포판 업데이트의 보안 의미를 재구성할 수 있는지 질문합니다. Patch2Vuln은 오래된/새로운 ELF 파일을 추출하고, Ghidra 및 Ghidriff를 사용하여 차이점을 분석하고, 변경된 함수를 순위화하고, 후보 정보를 수집하며, 오프라인 에이전트에게 초기 감사 보고서, 검증 계획 및 최종 감사 보고서를 생성하도록 요청하는 로컬에서 실행 가능하며 중단 및 재개 기능을 지원하는 파이프라인입니다. 저희는 Ubuntu `.deb` 패키지 쌍 25개를 사용하여 Patch2Vuln을 평가했습니다. 여기에는 20개의 보안 업데이트 쌍과 5개의 부정 제어 그룹이 포함되며, 모두 사설 소스 패치 및 바이너리 함수 데이터와 비교하여 수동으로 검증되었습니다. 에이전트는 20개의 보안 쌍 중 10개에서 검증된 보안 관련 패치 함수를 식별했으며, 20개 중 11개에서 허용 가능한 최종 근본 원인 범주를 할당했습니다. 오라클 진단 결과, 6개의 보안 쌍이 모델 추론 단계 이전에 실패했는데, 이는 바이너리 차이 분석 또는 순위 결정기가 올바른 함수를 누락했기 때문이며, 추가적으로 컨텍스트 내보내기 오류가 발생했습니다. 별도의 제한적인 검증 단계를 거쳐 tcpdump 패키지에 대한 두 가지 최소화된 동작 차이점(old/new)을 생성했지만, 충돌, 타임아웃, 샌드박스 오류 또는 메모리 손상 증거는 발견되지 않았습니다. 또한, 5개의 부정 제어 그룹 모두 '알 수 없음'으로 분류되었으며, 검증 차이점은 생성되지 않았습니다. 이러한 결과는 바이너리 패치로부터 취약점을 재구성하는 에이전트 기반 방법이 유용한 연구 대상으로 활용될 수 있음을 시사하는 동시에, 바이너리 차이 분석 범위 및 로컬 동작 검증이 여전히 제한적인 요소임을 보여줍니다.

Original Abstract

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted to local binary-derived evidence, can reconstruct the security meaning of Linux distribution updates. Patch2Vuln is a local, resumable pipeline that extracts old/new ELF pairs, diffs them with Ghidra and Ghidriff, ranks changed functions, builds candidate dossiers, and asks an offline agent to produce a preliminary audit, bounded validation plan, and final audit. We evaluate Patch2Vuln on 25 Ubuntu `.deb` package pairs: 20 security-update pairs and five negative controls, all manually adjudicated against private source-patch and binary-function ground truth. The agent localizes a verified security-relevant patch function in 10 of 20 security pairs and assigns an accepted final root-cause class in 11 of 20. Oracle diagnostics show that six security pairs fail before model reasoning because the binary differ or ranker omits the right function, with one additional context-export miss. A separate bounded validation pass produces two target-level minimized behavioral old/new differentials, both for tcpdump, but no crash, timeout, sanitizer finding, or memory-corruption proof; all five negative controls are classified as unknown and produce no validation differentials. These results support agentic vulnerability reconstruction from binary patches as a useful research target while showing that binary-diff coverage and local behavioral validation remain the limiting components.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!