2602.10787v1 Feb 11, 2026 cs.SE

VulReaD: 지식 그래프 기반 소프트웨어 취약점 추론 및 탐지

VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection

Samal Mukhtar

Citations: 4

h-index: 1

Yi Yao

Citations: 84

h-index: 3

Zhu Sun

Citations: 137

h-index: 4

Mustafa Mustafa

Citations: 10

h-index: 2

Y. Ong

Citations: 359

h-index: 11

Youchen Sun

Citations: 65

h-index: 6

소프트웨어 취약점 탐지는 현대 시스템에서 매우 중요한 과제입니다. 대규모 언어 모델(LLM)은 예측과 함께 자연어 설명을 제공하지만, 대부분의 연구는 이분법적 평가에 초점을 맞추고 있으며, 설명이 종종 Common Weakness Enumeration(CWE) 범주와의 의미적 일관성이 부족합니다. 본 논문에서는 CWE 수준의 추론으로 이분법적 분류를 넘어선, 지식 그래프 기반의 취약점 추론 및 탐지 방법인 VulReaD를 제안합니다. VulReaD는 보안 지식 그래프(KG)를 의미적 기반으로 활용하고, 강력한 교사 LLM을 사용하여 CWE와 일관된 대조 추론 감독 신호를 생성함으로써, 수동 어노테이션 없이 학생 모델을 학습할 수 있도록 합니다. 학생 모델은 Odds Ratio Preference Optimization(ORPO)를 사용하여 계통 분류와 일치하는 추론을 장려하고, 근거 없는 설명을 억제하도록 미세 조정됩니다. 세 개의 실제 데이터 세트에서 VulReaD는 기존 최고 성능 모델 대비 이분법적 F1 점수를 8-10% 향상시키고, 다중 클래스 분류에서는 Macro-F1 점수를 30%, Micro-F1 점수를 18% 향상시켰습니다. 실험 결과는 LLM이 딥 러닝 모델보다 이분법적 탐지에서 더 우수한 성능을 보이며, KG 기반 추론이 CWE 커버리지 및 해석 가능성을 향상시킨다는 것을 보여줍니다.

Original Abstract

Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations alongside predictions, but most work focuses on binary evaluation, and explanations often lack semantic consistency with Common Weakness Enumeration (CWE) categories. We propose VulReaD, a knowledge-graph-guided approach for vulnerability reasoning and detection that moves beyond binary classification toward CWE-level reasoning. VulReaD leverages a security knowledge graph (KG) as a semantic backbone and uses a strong teacher LLM to generate CWE-consistent contrastive reasoning supervision, enabling student model training without manual annotations. Students are fine-tuned with Odds Ratio Preference Optimization (ORPO) to encourage taxonomy-aligned reasoning while suppressing unsupported explanations. Across three real-world datasets, VulReaD improves binary F1 by 8-10% and multi-class classification by 30% Macro-F1 and 18% Micro-F1 compared to state-of-the-art baselines. Results show that LLMs outperform deep learning baselines in binary detection and that KG-guided reasoning enhances CWE coverage and interpretability.

1 Citations

0 Influential

5.5 Altmetric

28.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!