What question did this study set out to answer?

Evaluate how the confidence level of training data influences the accuracy of random forest models for gold prospectivity mapping.

March 10, 2026Open Access

Influence of training data confidence on random forest–based gold mineral prospectivity models in the Timmins Region, Canada

Key Points

Evaluate how the confidence level of training data influences the accuracy of random forest models for gold prospectivity mapping.
Integrated 41 evidence layers including geochemistry and 3D geophysical models.
Assessed the impact of different training datasets, including high-confidence gold mines and occurrences.
Calculated predictive accuracy using the Area Under the Curve (AUC) metric.
Models trained on high-confidence data achieved an AUC of 0.986.
Mine-based models identified 100% of known mines within 5.74% of the study area.
Lower-confidence occurrence data led to overestimations in prospective zones.

Abstract

• Assesses how gold mines, prospects, and occurrences impact model accuracy. • Integrates 41 layers, including geochemistry, and 3D geophysical models to −8250 m. • Shows models using high-confidence mines/prospects achieve AUC of 0.986. • Shows fault and fault intersection density control mineralization and guide gold-bearing fluids. This research evaluates the effectiveness of Random Forest-based mineral prospectivity mapping for gold exploration in the giant Timmins–Porcupine mining regions. A central and distinguishing objective of this approach is the systematic examination of different training and validation datasets-including gold mines, prospects, and occurrences, as well as their combinations-to quantitatively determine how the selection of mineralization data influences model performance, predictive accuracy, and exploration efficiency. This study integrates a comprehensive suite of 41 evidence layers, featuring lithological, structural, airborne magnetic, radiometric, and geochemical data, alongside multi-depth 3D geophysical models reaching depths of −8,250 m. The results indicate that models trained on high-confidence data (gold mines and prospects) yield the most reliable predictions, with the mine-based model achieving an Area Under the Curve (AUC) of 0.986 and successfully identifying 100% of known mines within just 5.74% of the study area. In contrast, models relying on lower-confidence “occurrences” tended to overpredict prospective zones, highlighting the limitations of such data in supervised learning. Fault density and fault-intersection density emerged as the primary controls on mineralization, emphasizing the role of crustal-scale structures as conduits for hydrothermal fluids. These findings demonstrate that the RF-based approach is uniquely suited for regional and greenfield exploration, where high-confidence search-space reduction is critical.

Influence of training data confidence on random forest–based gold mineral prospectivity models in the Timmins Region, Canada

Key Points

Abstract

Cite This Study