2605.30152v1 May 28, 2026 cs.CL

Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?

Ruowang Zhang

Citations: 28

h-index: 3

Michel Galley

Citations: 2,129

h-index: 9

Amir H. Abdi

Citations: 16

h-index: 3

Zhikai Chen

Citations: 404

h-index: 9

Siheng Xiong

Citations: 24

h-index: 2

Xiaoqian Wang

Citations: 213

h-index: 4

Jing Gao

Citations: 2

h-index: 1

Xiaoze Liu

Citations: 2,445

h-index: 20

Proactive agents read user activity as text and call an LLM on every event to decide whether to act. But user activity is not natively text: it is a structured event stream of (actor, verb, object, timestamp) tuples that the operating system already maintains in graph form. Rendering the structure as text and asking an LLM to recover it is a round-trip the system never had to take. We treat the always-on signal as graph updates rather than text and use a small temporal-graph-learning (TGL) model as the encoder: one forward pass yields a per-event trigger probability and a per-entity routing score, and only the downstream agent (turning a small structured handoff into a fluent user-facing sentence) is an LLM call, invoked only when the trigger fires. TGL improves F1 on each of 14 backbones (mean +16.7, up to +46.0); in trigger-architecture comparisons, one TGL checkpoint gives the strongest trigger AUCs and the most stable deployed threshold. It runs at 11.13 ms per event on a GPU server and 13.99 ms on a consumer laptop, approximately 4--7x and 12--83x faster than every single-forward LLM-as-trigger configuration tested in each regime, with an approximately 220 MiB BF16 resident footprint deployable on-device alongside the privacy-sensitive activity stream it consumes.

1 Citations

0 Influential

10 Altmetric

51.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!