Bowei He
Publications
Beyond Message Passing: A Semantic View of Agent Communication Protocols
Agent communication protocols are becoming critical infrastructure for large language model (LLM) systems that must use tools, coordinate with other agents, and operate across heterogeneous environments. This work presents a human-inspired perspective on this emerging landscape by organizing agent communication into three layers: communication, syntactic, and semantic. Under this framework, we systematically analyze 18 representative protocols and compare how they support reliable transport, structured interaction, and meaning-level coordination. Our analysis shows a clear imbalance in current protocol design. Most protocols provide increasingly mature support for transport, streaming, schema definition, and lifecycle management, but offer limited protocol-level mechanisms for clarification, context alignment, and verification. As a result, semantic responsibilities are often pushed into prompts, wrappers, or application-specific orchestration logic, creating hidden interoperability and maintenance costs. To make this gap actionable, we further identify major forms of technical debt in today's protocol ecosystem and distill practical guidance for selecting protocols under different deployment settings. We conclude by outlining a research agenda for interoperable, secure, and semantically robust agent ecosystems that move beyond message passing toward shared understanding.
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG@5 from 0.869 to 0.940; on ToolBench (2,413~APIs), from 0.834 to 0.848. We also evaluate two learned extensions: a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter. The MLP re-ranker hurts or matches baseline when outcome data is sparse relative to the tool set; the contrastive adapter provides comparable gains on MetaTool (NDCG@5: 0.931). All methods are evaluated on the same held-out 30\% test split. The practical takeaway is to start with the zero-cost refinement and add learned components only when data density warrants it. All mechanisms run within single-digit millisecond CPU budgets.