2605.30208v1 May 28, 2026 cs.SE

Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

Tianyu He
Tianyu He
Citations: 11
h-index: 1
Peter C. Rigby
Peter C. Rigby
Citations: 117
h-index: 6
C. Adams
C. Adams
Citations: 2
h-index: 1
Arjun Singh Banga
Arjun Singh Banga
Citations: 0
h-index: 0
P. Bansal
P. Bansal
Citations: 35
h-index: 2
Pedro Canahuati
Pedro Canahuati
Citations: 0
h-index: 0
Nate Cook
Nate Cook
Citations: 0
h-index: 0
B.J. Ellis
B.J. Ellis
Citations: 379
h-index: 4
P. Goyal
P. Goyal
Citations: 0
h-index: 0
G. Grewal
G. Grewal
Citations: 90
h-index: 6
Matt Labunka
Matt Labunka
Citations: 0
h-index: 0
A. Manners
A. Manners
Citations: 103
h-index: 4
David Molnar
David Molnar
Citations: 475
h-index: 5
G. Ng
G. Ng
Citations: 158
h-index: 4
Vishal Parekh
Vishal Parekh
Citations: 24
h-index: 3
Jiefu Pei
Jiefu Pei
Citations: 1
h-index: 1
Frederic Sagnes
Frederic Sagnes
Citations: 0
h-index: 0
James Saindon
James Saindon
Citations: 11
h-index: 1
William E. Shackleton
William E. Shackleton
Citations: 27
h-index: 3
S. Sidhu
S. Sidhu
Citations: 0
h-index: 0
Gursharan Singh
Gursharan Singh
Citations: 109
h-index: 5
Karthik Sridhar
Karthik Sridhar
Citations: 115
h-index: 6
Matt Steiner
Matt Steiner
Citations: 3
h-index: 1
Pratibha Udmalpet
Pratibha Udmalpet
Citations: 0
h-index: 0
Sean Xia
Sean Xia
Citations: 43
h-index: 4
S. Yan
S. Yan
Citations: 25
h-index: 1
Audris Mockus
Audris Mockus
Citations: 1
h-index: 1
N. Nagappan
N. Nagappan
Citations: 16,202
h-index: 58
Souvik Bhattacharya
Souvik Bhattacharya
Citations: 24
h-index: 2
Payal Bhuptani
Payal Bhuptani
Citations: 2
h-index: 1
Rujing Cao
Rujing Cao
Citations: 62
h-index: 3

AI-assisted coding tools have altered software production. At Meta, significant lines of code per human-landed diff grew by 105.9% year over year and per-developer diff volume rose 51%, with agentic AI responsible for over 80% of that growth. Meanwhile, the share of diffs receiving timely review has declined, exposing a widening gap between code supply and reviewer bandwidth. We ask three questions that progress from feasibility through calibration to impact: (1) can risk-stratified automation operate at scale across diverse organizations, (2) how does tuning the risk threshold affect the trade-off between automation yield and safety, and (3) to what extent does automated review reduce end-to-end latency for AI-generated changes? We deployed RADAR (Risk Aware Diff Auto Review), a multi-stage funnel that classifies each diff by authorship and source type, applies eligibility gates, static heuristics, a machine-learned Diff Risk Score, LLM-based Automated Code Review, and deterministic validation before landing qualifying changes. We evaluate RADAR through telemetry covering 535K+ RADAR-reviewed diffs, observational before-after comparisons for policy changes, and difference-in-differences analysis of efficiency outcomes. RADAR has reviewed 535K+ diffs and landed 331K+. Relaxing the Diff Risk Score threshold from the 25th to the 50th percentile increased the approve rate to 60.31%. The revert rate for RADAR-reviewed diffs is 1/3 that of non-RADAR diffs, and the Production Incident rate is 1/50 that of non-RADAR diffs. RADAR reduces median time to close by over 330% and median diff review wall time by 35%. Risk-aware layered automation can materially reduce review bottlenecks created by AI-driven code growth without compromising production safety.

0 Citations
0 Influential
29 Altmetric
145.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!