Ruilin Wu
Publications
Order Matters: Unveiling the Hidden Impact of Macro Placement Sequences via Proxy-Guided LLM Evolution
Macro placement is a fundamental step in modern chip physical design, playing a crucial role in determining the solution quality of high-dimensional combinatorial optimization problems. Despite recent advancements in machine learning for spatial coordinate determination, the temporal dimension of placement sequencing remains largely governed by static heuristics. In this work, we demonstrate that the placement sequence is not merely a preprocessing step but a decisive factor in optimization, where suboptimal early decisions trigger irreversible domino effects that constrain the solution space. To harness this unexplored dimension, we propose \textbf{OrderPlace}, a proxy-guided LLM evolution framework for automatically discovering macro placement order strategies. Instead of relying on manually crafted heuristics such as area- or connectivity-based ordering, OrderPlace explores a broader space of code-level policies, ranging from static scoring metrics to dynamic physics-inspired mechanisms. To mitigate the prohibitive cost of evaluating sequences, we introduce a lightweight proxy evaluation mechanism that efficiently filters candidates using a deterministic greedy probe. Experimental results on the standard ISPD 2005 benchmarks demonstrate that OrderPlace discovers novel ordering strategies. Compared with WireMask-EA and the state-of-the-art method EGPlace, OrderPlace reduces wirelength by 34.04\% and 14.08\%, respectively.
Enhancing LUT-based Deep Neural Networks Inference through Architecture and Connectivity Optimization
Deploying deep neural networks (DNNs) on resource-constrained edge devices such as FPGAs requires a careful balance among latency, power, and hardware resource usage, while maintaining high accuracy. Existing Lookup Table (LUT)-based DNNs -- such as LogicNets, PolyLUT, and NeuraLUT -- face two critical challenges: the exponential growth of LUT size and inefficient random sparse connectivity. This paper presents SparseLUT, a comprehensive framework that addresses these challenges through two orthogonal optimizations. First, we propose an architectural enhancement that aggregates multiple PolyLUT sub-neurons via an adder, significantly reducing LUT consumption by 2.0x-13.9x and lowering inference latency by 1.2x-1.6x, all while maintaining comparable accuracy. Building upon this foundation, we further introduce a non-greedy training algorithm that optimizes neuron connectivity by selectively pruning less significant inputs and strategically regrowing more effective ones. This training optimization, which incurs no additional area and latency overhead, delivers consistent accuracy improvements across benchmarks -- achieving up to a 2.13% gain on MNIST and 0.94% on Jet Substructure Classification compared to existing LUT-DNN approaches.