2606.16952v1 Jun 15, 2026 cs.LG

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

Adel Javanmard
Adel Javanmard
Citations: 57
h-index: 5
Sergei Vassilvitskii
Sergei Vassilvitskii
Citations: 3,380
h-index: 6
Kareem Amin
Kareem Amin
Citations: 74
h-index: 5
Rudrajit Das
Rudrajit Das
Citations: 362
h-index: 9
Alessandro Epasto
Alessandro Epasto
Citations: 3,283
h-index: 4
Dennis Kraft
Dennis Kraft
Citations: 188
h-index: 7
M'onica Ribero
M'onica Ribero
Citations: 15
h-index: 2

The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training corpus. In this work, we present a customizable empirical auditing framework designed to detect and explain such data disclosures. Our framework introduces a mechanism to distinguish between "true disclosures"-where the system directly reproduces a user's information-and "phantom disclosures''-where the system incidentally generates a user's data. By partitioning input data into training and holdout sets and applying rigorous statistical hypothesis testing, we determine if observed disclosures are consistent with strict privacy baselines, such as zero-learning or specific Differential Privacy (DP) bounds. Crucially, this approach requires no model access, no canary insertion, and no reference model training -only the synthetic output and a held-out control set. We demonstrate that this framework effectively functions as a membership inference attack, providing empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. Our approach is model-agnostic, applies to any synthetic data generation mechanism, and requires orders of magnitude fewer computational resources than shadow-model or canary-based alternatives.

0 Citations
0 Influential
4.5 Altmetric
22.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!