2602.01553v1 Feb 02, 2026 cs.LG

일반 트랜스포머는 놀라울 정도로 강력한 링크 예측 모델이다

Plain Transformers are Surprisingly Powerful Link Predictors

Quang Truong

Citations: 3

h-index: 1

Yu Song

Citations: 1

h-index: 1

Donald Loveland

Citations: 8

h-index: 2

Mingxuan Ju

Citations: 93

h-index: 6

Tong Zhao

Citations: 747

h-index: 12

Neil Shah

Citations: 753

h-index: 12

Jiliang Tang

Citations: 20

h-index: 3

링크 예측은 그래프 머신 러닝의 핵심 과제로, 풍부하고 복잡한 위상적 의존성을 포착하는 모델이 필요합니다. 그래프 신경망(GNN)이 표준적인 해결책이지만, 최첨단 파이프라인은 종종 명시적인 구조적 휴리스틱 또는 많은 메모리를 사용하는 노드 임베딩에 의존하며, 이러한 접근 방식은 대규모 그래프에 대한 일반화 또는 확장성이 떨어지는 경향이 있습니다. 새롭게 등장하는 그래프 트랜스포머(GT)는 잠재적인 대안을 제공하지만, 복잡한 구조적 인코딩으로 인해 상당한 오버헤드를 발생시켜 대규모 링크 예측에 적용하는 데 어려움을 겪습니다. 본 연구에서는 PENCIL이라는 인코더 전용 일반 트랜스포머를 사용하여 이러한 정교한 패러다임을 극복하고자 합니다. PENCIL은 수동으로 설계된 사전 지식 대신, 샘플링된 로컬 서브그래프에 대한 어텐션을 활용하여 표준 트랜스포머의 확장성과 하드웨어 효율성을 유지합니다. 실험적 및 이론적 분석을 통해 PENCIL이 GNN보다 풍부한 구조적 정보를 추출하며, 광범위한 휴리스틱 및 서브그래프 기반 표현을 암묵적으로 일반화한다는 것을 보여줍니다. 실험적으로 PENCIL은 휴리스틱 기반 GNN보다 성능이 우수하며, ID 임베딩 기반 대안보다 훨씬 더 효율적인 매개변수를 사용합니다. 또한 노드 특징이 없는 다양한 벤치마크에서도 경쟁력 있는 성능을 보입니다. 이러한 결과는 복잡한 엔지니어링 기술에 대한 기존의 의존성을 재고하게 하며, 간단한 설계 선택만으로도 동일한 기능을 달성할 수 있음을 보여줍니다.

Original Abstract

Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies. While Graph Neural Networks (GNNs) are the standard solution, state-of-the-art pipelines often rely on explicit structural heuristics or memory-intensive node embeddings -- approaches that struggle to generalize or scale to massive graphs. Emerging Graph Transformers (GTs) offer a potential alternative but often incur significant overhead due to complex structural encodings, hindering their applications to large-scale link prediction. We challenge these sophisticated paradigms with PENCIL, an encoder-only plain Transformer that replaces hand-crafted priors with attention over sampled local subgraphs, retaining the scalability and hardware efficiency of standard Transformers. Through experimental and theoretical analysis, we show that PENCIL extracts richer structural signals than GNNs, implicitly generalizing a broad class of heuristics and subgraph-based expressivity. Empirically, PENCIL outperforms heuristic-informed GNNs and is far more parameter-efficient than ID-embedding--based alternatives, while remaining competitive across diverse benchmarks -- even without node features. Our results challenge the prevailing reliance on complex engineering techniques, demonstrating that simple design choices are potentially sufficient to achieve the same capabilities.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!