2605.30208v1 May 28, 2026 cs.SE

Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

Tianyu He

Citations: 11

h-index: 1

Peter C. Rigby

Citations: 117

h-index: 6

C. Adams

Citations: 2

h-index: 1

Arjun Singh Banga

Citations: 0

h-index: 0

P. Bansal

Citations: 35

h-index: 2

Pedro Canahuati

Citations: 0

h-index: 0

Nate Cook

Citations: 0

h-index: 0

B.J. Ellis

Citations: 379

h-index: 4

P. Goyal

Citations: 0

h-index: 0

G. Grewal

Citations: 90

h-index: 6

Matt Labunka

Citations: 0

h-index: 0

A. Manners

Citations: 103

h-index: 4

David Molnar

Citations: 475

h-index: 5

G. Ng

Citations: 158

h-index: 4

Vishal Parekh

Citations: 24

h-index: 3

Jiefu Pei

Citations: 1

h-index: 1

Frederic Sagnes

Citations: 0

h-index: 0

James Saindon

Citations: 11

h-index: 1

William E. Shackleton

Citations: 27

h-index: 3

S. Sidhu

Citations: 0

h-index: 0

Gursharan Singh

Citations: 109

h-index: 5

Karthik Sridhar

Citations: 115

h-index: 6

Matt Steiner

Citations: 3

h-index: 1

Pratibha Udmalpet

Citations: 0

h-index: 0

Sean Xia

Citations: 43

h-index: 4

S. Yan

Citations: 25

h-index: 1

Audris Mockus

Citations: 1

h-index: 1

N. Nagappan

Citations: 16,202

h-index: 58

Souvik Bhattacharya

Citations: 24

h-index: 2

Payal Bhuptani

Citations: 2

h-index: 1

Rujing Cao

Citations: 62

h-index: 3

AI-assisted coding tools have altered software production. At Meta, significant lines of code per human-landed diff grew by 105.9% year over year and per-developer diff volume rose 51%, with agentic AI responsible for over 80% of that growth. Meanwhile, the share of diffs receiving timely review has declined, exposing a widening gap between code supply and reviewer bandwidth. We ask three questions that progress from feasibility through calibration to impact: (1) can risk-stratified automation operate at scale across diverse organizations, (2) how does tuning the risk threshold affect the trade-off between automation yield and safety, and (3) to what extent does automated review reduce end-to-end latency for AI-generated changes? We deployed RADAR (Risk Aware Diff Auto Review), a multi-stage funnel that classifies each diff by authorship and source type, applies eligibility gates, static heuristics, a machine-learned Diff Risk Score, LLM-based Automated Code Review, and deterministic validation before landing qualifying changes. We evaluate RADAR through telemetry covering 535K+ RADAR-reviewed diffs, observational before-after comparisons for policy changes, and difference-in-differences analysis of efficiency outcomes. RADAR has reviewed 535K+ diffs and landed 331K+. Relaxing the Diff Risk Score threshold from the 25th to the 50th percentile increased the approve rate to 60.31%. The revert rate for RADAR-reviewed diffs is 1/3 that of non-RADAR diffs, and the Production Incident rate is 1/50 that of non-RADAR diffs. RADAR reduces median time to close by over 330% and median diff review wall time by 35%. Risk-aware layered automation can materially reduce review bottlenecks created by AI-driven code growth without compromising production safety.

0 Citations

0 Influential

29 Altmetric

145.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!