Estimating the risk of re-identification probabilistically is well-developed for the case of a random representative sample drawn from the general population, such as large-scale government surveys conducted regularly at National Statistical Institutes. Recent work extended this procedure to assess the risk of re-identification in non-probability subpopulation registers such as a cancer register. In this paper, we extend this work further to the case of samples drawn from registers or more generally to non-probability samples, such as those used in opt-in panels at survey organizations. The assumption is that membership to the subpopulation register is not known and the sampling mechanism is also unknown. We show how to assess the risk of re-identification for these types of non-probability samples using a probability-based reference sample to infer population parameters under the probabilistic modelling framework. We demonstrate with a simulation study and a real application on the 2021 Survey of Doctoral Recipients drawn from a subpopulation register of all PhD recipients from an accredited US institution.
Schmid et al. (Wed,) studied this question.