We survey sensitive-data handling across 39 large language models from 14 independent labs. The 7 probe scenarios span 5 categories: credentials, personally identifiable information (PII), protected health information (PHI), financial account numbers, and data-loss prevention (DLP) scanning. The survey produces 135+ classified model-probe combinations built from ~1, 300 multi-run evaluations. Safety behavior varies along 3 independent axes that stack: architecture (a 24B-active capacity floor applies on credentials across routing topologies, with small-active Mixture-of-Experts (MoE) designs as the dominant expression of that floor), generation (alignment quality moves the boundary within a family), and lab alignment investment. None of the axes independently predicts safety; they combine. Beyond the 3 axes, safety dissociates across categories within a single model: a model that protects credentials, Social Security Numbers (SSNs), and database passwords can still name employees in a salary document on every run. Safety also dissociates across output surfaces. Of 73 runs that populated `toolcalls. arguments`, 20 exfiltrated sensitive values through that channel, observed in Neural Architecture Search (NAS) -pruned models and, critically, in the day-of-release Moonshot Kimi K2. 6, an A-tier model otherwise multi-run SAFE on credentials, PII, financial, and DLP. Of 43 runs in which the provider populated `reasoningcontent` with ≥20 characters, 29 leaked sensitive values into reasoning while chat content was classified SAFE, MISSED, or TRUNCATED. Two independent NAS prunings of Meta's Llama line, by NVIDIA, broke PHI safety through different output surfaces. The axes, categories, and surfaces combine into a five-tier model ranking (§10). A frontier-alignment cluster (Opus 4. 7, GPT-5. 4 and GPT-5. 4-mini, Gemini 3 Flash and 3. 1 Pro) sits at the top, uniform-SAFE across every category tested. Sonnet 4. 6 sits just below with a single DOB-only PHI leak across 15 runs, self-flagged as a violation. Mid-attention-lab flagships (xAI Grok, Meta Llama) do not make that cluster. We frame the findings as a map for platform teams deploying LLMs near sensitive data, and a set of seven things that don't transfer the way you'd expect.
Mohammad Al Zubaidi (Tue,) studied this question.