Key points are not available for this paper at this time.
Major app stores have introduced privacy labels (e.g., Google Play's data safety section since July 2022), requiring app developers to provide their privacy disclosures, including data types collected and shared by their apps and third-party SDKs they use. Third-party SDK providers have published guidance pages instructing app developers what data types their SDKs use and thus must be declared to the data safety section. Availability and correctness of the guidance pages are critical issues but have yet to receive any attention. This paper presents the first study of the guidance pages. We first attempted to collect the guidance pages of 175 commercial SDKs widely used in Android apps and did not obtain them for 63% of the SDKs, suggesting that the majority of them have not provided guidance pages. Further, we develop a system that detects inconsistencies between the guidance pages and the actual data collection of SDKs. It uses machine learning and dynamic taint analysis to extract privacy practices from the guidance pages and SDKs and analyzes the outcomes to detect the critical gap. We construct datasets of 47 guidance pages and 43 SDKs' 159 sample apps and evaluate the system. The system uncovered discrepancies related to location and identifiers in the guidance pages of eight SDKs. We also evaluate the machine learning model's accuracy for unknown guidance page contents. The results show that the model performs satisfactorily for updated guidance pages, and the accuracy for newly posted ones increases as the model learns more. This study exposes the critical issues of the guidance pages and also contributes to tools and datasets for facilitating further research on guidance pages and privacy labels.
Inayoshi et al. (Sun,) studied this question.