This paper addresses the challenge of protecting Personally Identifiable Information (PII) in textual data, identifying and anonymizing PII to ensure privacy and regulatory compliance, while preserving data utility. We model this bi-criteria optimization problem as a two-player Stackelberg game, where an attacker seeks to link anonymized data back to individuals and a protector anonymizes the data to prevent re-identification. We show that the problem is intractable. Thus we develop SHIELD, an attack-aware PII protection system that iteratively engages the protector and attacker to prevent both PII breaches and over-scrubbing. SHIELD integrates logical reasoning with machine learning to identify PII, and supports pluggable attackers for robustness against re-identification. It achieves a constant-factor approximation for utility loss while mitigating risk. Using synthetic and real-world datasets, we empirically show that SHIELD offers better privacy-utility trade-off than prior PII protection systems, while remaining efficient and scalable.
Liu et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: