What question did this study set out to answer?

This research aims to develop an efficient algorithm for high-occupancy itemset mining that facilitates clear evaluation and competitive performance.

May 26, 2026Open Access

AURA-HOI: Auditable Unified Representative Mining for High-Occupancy Itemsets

Key Points

This research aims to develop an efficient algorithm for high-occupancy itemset mining that facilitates clear evaluation and competitive performance.
Introduced AURA-HOI, which separates scoring semantics from output views in high-occupancy itemset mining.
Compared AURA-HOI with HEP and DFHOI across five transactional datasets using various configurations.
Utilized techniques such as vertical bitset evidence and occupancy-envelope pruning for optimization.
AURA-HOI demonstrates faster performance than HEP in 25 out of 35 configurations, and faster than DFHOI in 15 out of 35 configurations.
Maintained lower peak memory usage than HEP in 28 of 35 cases and lower than DFHOI in 27 of 35 configurations.
All algorithms produced identical raw itemset counts under matched semantics across datasets.

Abstract

High-occupancy itemset mining (HOIM) extends frequent itemset mining by requiring anitemset to occupy a sufficiently large fraction of the transactions in which it appears. Thisdensity-oriented objective is useful when support alone is not discriminative, but it also destroysthe direct anti-monotonicity that makes classical frequent itemset mining efficient. Existinghigh-occupancy itemset miners further complicate evaluation because their outputs are notalways the same object: some enumerate threshold-complete raw itemsets, while others reportadaptive, maximal, top-k, or closed representative patterns. This paper presents AURA-HOI,an auditable high-occupancy itemset mining algorithm designed to separate scoring semanticsfrom output views. AURA-HOIsupports a raw fullset mode for direct comparison with HEPand DFHOI under a shared support–occupancy threshold, and a support-class representativemode for compact closed-output analysis. The method combines vertical bitset evidence, residualoccupancy-envelope pruning, and a support-equivalence ledger. Experiments implemented inC and executed on a laptop-scale environment evaluate AURA-HOI, HEP, and DFHOI onfive transactional datasets: mushrooms, chess, retail, T10I4D100K, and kosarak. Across all 35dataset–threshold configurations, the three algorithms emit identical raw itemset counts undermatched semantics. AURA-HOI is faster than HEP on 25 of 35 configurations and faster thanDFHOI on 15 of 35 configurations; it also uses lower peak memory than HEP on 28 of 35configurations and lower peak memory than DFHOI on 27 of 35 configurations. The resultsshow that the proposed audit-oriented design preserves raw-output equivalence while providingcompetitive runtime and stable memory behavior across dense, sparse, synthetic, and largeclickstream workloads.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Minh Quan Van Ha (Sun,) studied this question.

synapsesocial.com/papers/6a1539ccb5d9c58d83e8ce6a https://doi.org/https://doi.org/10.5281/zenodo.20362094

Bookmark

View Full Paper