• Assesses how gold mines, prospects, and occurrences impact model accuracy. • Integrates 41 layers, including geochemistry, and 3D geophysical models to −8250 m. • Shows models using high-confidence mines/prospects achieve AUC of 0.986. • Shows fault and fault intersection density control mineralization and guide gold-bearing fluids. This research evaluates the effectiveness of Random Forest-based mineral prospectivity mapping for gold exploration in the giant Timmins–Porcupine mining regions. A central and distinguishing objective of this approach is the systematic examination of different training and validation datasets-including gold mines, prospects, and occurrences, as well as their combinations-to quantitatively determine how the selection of mineralization data influences model performance, predictive accuracy, and exploration efficiency. This study integrates a comprehensive suite of 41 evidence layers, featuring lithological, structural, airborne magnetic, radiometric, and geochemical data, alongside multi-depth 3D geophysical models reaching depths of −8,250 m. The results indicate that models trained on high-confidence data (gold mines and prospects) yield the most reliable predictions, with the mine-based model achieving an Area Under the Curve (AUC) of 0.986 and successfully identifying 100% of known mines within just 5.74% of the study area. In contrast, models relying on lower-confidence “occurrences” tended to overpredict prospective zones, highlighting the limitations of such data in supervised learning. Fault density and fault-intersection density emerged as the primary controls on mineralization, emphasizing the role of crustal-scale structures as conduits for hydrothermal fluids. These findings demonstrate that the RF-based approach is uniquely suited for regional and greenfield exploration, where high-confidence search-space reduction is critical.
Behnia et al. (Sun,) studied this question.