Avaljot Singh
Publications
RuleFlow : Generating Reusable Program Optimizations with LLMs
Optimizing Pandas programs is a challenging problem. Existing systems and compiler-based approaches offer reliability but are either heavyweight or support only a limited set of optimizations. Conversely, using LLMs in a per-program optimization methodology can synthesize nontrivial optimizations, but is unreliable, expensive, and offers a low yield. In this work, we introduce a hybrid approach that works in a 3-stage manner that decouples discovery from deployment and connects them via a novel bridge. First, it discovers per-program optimizations (discovery). Second, they are converted into generalised rewrite rules (bridge). Finally, these rules are incorporated into a compiler that can automatically apply them wherever applicable, eliminating repeated reliance on LLMs (deployment). We demonstrate that RuleFlow is the new state-of-the-art (SOTA) Pandas optimization framework on PandasBench, a challenging Pandas benchmark consisting of Python notebooks. Across these notebooks, we achieve a speedup of up to 4.3x over Dias, the previous compiler-based SOTA, and 1914.9x over Modin, the previous systems-based SOTA. Our code is available at https://github.com/ADAPT-uiuc/RuleFlow.
AgentRx: Diagnosing AI Agent Failures from Execution Trajectories
AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manually annotating failed agent runs and release a novel benchmark of 115 failed trajectories spanning structured API workflows, incident management, and open-ended web/file tasks. Each trajectory is annotated with a critical failure step and a category from a grounded-theory derived, cross-domain failure taxonomy. To mitigate the human cost of failure attribution, we present AGENTRX, an automated domain-agnostic diagnostic framework that pinpoints the critical failure step in a failed agent trajectory. It synthesizes constraints, evaluates them step-by-step, and produces an auditable validation log of constraint violations with associated evidence; an LLM-based judge uses this log to localize the critical step and category. Our framework improves step localization and failure attribution over existing baselines across three domains.