Accurate cancer detection from histopathology images typically requires large amounts of patch-level annotated data, which is costly and time-consuming to obtain in clinical settings. In practice, only patient-level or slide-level diagnoses are routinely available, motivating the development of weakly supervised learning approaches. In this work, we implement and evaluate an attention-based Multiple Instance Learning (MIL) framework for cancer detection on the PatchCamelyon (PCam) benchmark — a patch-level dataset derived from the CAMELYON16 whole-slide image challenge for lymph node metastasis detection — using only slide-level labels during training. Patches are grouped into bags of 32, and a lightweight attention network learns to identify the most diagnostically relevant patches without patch-level supervision. We compare our weakly supervised approach against two fully supervised baselines — ResNet50 and Vision Transformer (ViT-B/16) fine-tuned on patch-level labels. Our attention-based MIL achieves an AUC of 0.9860, outperforming ResNet50 (AUC: 0.9137) and ViT-B/16 (AUC: 0.9478) despite using significantly weaker supervision. Attention visualisations reveal that the model learns to focus on clinically meaningful tissue architecture features, demonstrating interpretability without requiring pixel-level annotations. These results suggest that weakly supervised MIL is a promising and practical approach for cancer detection in resource-constrained clinical environments.
amila belhacini (Thu,) studied this question.