2602.01771v1 Feb 02, 2026 cs.CL

SOG_k: 그래프 구조를 명시적으로 이해하기 위한 단일 LLM 토큰

<SOG_k>: One LLM Token for Explicit Graph Structural Understanding

Zijun Di

Citations: 3

h-index: 1

Bin Lu

Shanghai Jiao Tong University

Citations: 432

h-index: 8

Luoyi Fu

Citations: 3,774

h-index: 32

Xiaoying Gan

Citations: 70

h-index: 5

Cheng Zhou

Citations: 1,450

h-index: 20

Jingyao Wu

Citations: 17

h-index: 3

Meng Jin

Citations: 78

h-index: 2

Xinbing Wang

Citations: 10

h-index: 2

대규모 언어 모델(LLM)은 비정형 데이터 이해에 큰 잠재력을 보이지만, 구조적 환각(hallucination) 문제로 인해 그래프 데이터 처리에는 여전히 상당한 어려움이 있습니다. 기존 방법들은 주로 그래프를 자연어로 표현하여 과도한 토큰 사용과 분산된 주의 집중을 유발하거나, 그래프를 학습 가능한 연속적인 임베딩(소프트 프롬프트)으로 변환하지만, 이는 원래 텍스트 토큰과의 심각한 불일치를 초래합니다. 이러한 문제를 해결하기 위해, 우리는 그래프의 구조를 완전히 표현하고 통일된 토큰 공간 내에서 구조적 정보를 공유할 수 있도록 특별한 토큰 <SOG_k>를 제안합니다. 특히, 우리는 각 그래프 토폴로지를 고도로 선택적인 단일 토큰으로 매핑하는 토폴로지 기반 구조 토크나이저를 제안합니다. 이후, 새로운 구조 토큰을 기존 텍스트 토큰과 정렬하기 위해 구조 질문-응답 코퍼스를 구축했습니다. 이러한 접근 방식을 통해, <SOG_k>는 LLM이 간결하고 정확하게 그래프를 이해하고 생성하며 추론할 수 있도록 지원합니다. 5개의 그래프 레벨 벤치마크에 대한 광범위한 실험 결과, 제안하는 방법은 기존 방법보다 9.9%에서 41.4%의 성능 향상을 달성했으며, 해석 가능성과 일관성을 보여주었습니다. 또한, 제안하는 방법은 노드 레벨 작업에도 유연하게 적용될 수 있어, 전역적 및 지역적 구조 이해를 모두 가능하게 합니다. 코드베이스는 다음 링크에서 공개적으로 이용할 수 있습니다: https://github.com/Jingyao-Wu/SOG.

Original Abstract

Large language models show great potential in unstructured data understanding, but still face significant challenges with graphs due to their structural hallucination. Existing approaches mainly either verbalize graphs into natural language, which leads to excessive token consumption and scattered attention, or transform graphs into trainable continuous embeddings (i.e., soft prompt), but exhibit severe misalignment with original text tokens. To solve this problem, we propose to incorporate one special token <SOG_k> to fully represent the Structure Of Graph within a unified token space, facilitating explicit topology input and structural information sharing. Specifically, we propose a topology-aware structural tokenizer that maps each graph topology into a highly selective single token. Afterwards, we construct a set of hybrid structure Question-Answering corpora to align new structural tokens with existing text tokens. With this approach, <SOG_k> empowers LLMs to understand, generate, and reason in a concise and accurate manner. Extensive experiments on five graph-level benchmarks demonstrate the superiority of our method, achieving a performance improvement of 9.9% to 41.4% compared to the baselines while exhibiting interpretability and consistency. Furthermore, our method provides a flexible extension to node-level tasks, enabling both global and local structural understanding. The codebase is publicly available at https://github.com/Jingyao-Wu/SOG.

2 Citations

0 Influential

36 Altmetric

182.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!