What question did this study set out to answer?

The aim is to develop differentially private data structures for counting substrings and documents containing specific patterns.

April 25, 2026

A Differentially Private Data Structure for Substring and Document Counting

Puntos clave

The aim is to develop differentially private data structures for counting substrings and documents containing specific patterns.
Introduced novel differentially private data structures for substring and document counting
Analyzed the additive error bounds of the proposed structures
Optimized results considering ε-differential privacy metrics
Achieved optimal error bounds up to a poly-logarithmic factor in relation to document count and length
Provided improved algorithms for related tasks like mining frequent substrings
Demonstrated superior privacy guarantees without significant accuracy loss

Resumen

For databases consisting of many text documents, one of the most fundamental data analysis tasks is counting (i) how often a pattern appears as a substring in the database (substring counting) and (ii) how many documents in the collection contain the pattern as a substring (document counting). If such a database contains sensitive data, it is crucial to protect the privacy of individuals in the database. Differential privacy is the gold standard for privacy in data analysis. It gives rigorous privacy guarantees, but comes at the cost of yielding less accurate results. In this paper, we study the problem of substring and document counting under differential privacy. We give the first differentially private data structures for these problems and provide bounds on their additive error. For ε-differential privacy, we show that the error of our data structure is optimal up to a poly-logarithmic factor in the number of documents and length of the longest document. Our data structures immediately lead to improved algorithms for related problems, such as privately mining frequent substrings and q-grams.

Me gusta

Guardar

Cite This Study

Bernardini et al. (Thu,) studied this question.

synapsesocial.com/papers/69ec5b2388ba6daa22dacbc5 https://doi.org/https://doi.org/10.1145/3810900.3810908

Me gusta

Guardar