What question did this study set out to answer?

The aim is to develop an effective cancer detection method from histopathology images using weakly supervised learning.

May 9, 2026Open Access

Weakly Supervised Cancer Detection in Histopathology Images via Attention-Based Multiple Instance Learning: A Comparative Study

Key Points

The aim is to develop an effective cancer detection method from histopathology images using weakly supervised learning.
Implemented an attention-based multiple instance learning (MIL) framework on the PatchCamelyon benchmark.
Grouped patches into bags of 32 for slide-level label training with a lightweight attention network.
Compared the MIL approach with fully supervised baselines: ResNet50 and Vision Transformer (ViT-B/16).
Achieved an AUC of 0.9860 in cancer detection, surpassing ResNet50 (AUC 0.9137) and ViT-B/16 (AUC 0.9478).
Demonstrated model interpretability by focusing on significant tissue architecture features without pixel-level annotations.
Highlighting the effectiveness of weakly supervised learning in cancer detection with limited data.

Abstract

Accurate cancer detection from histopathology images typically requires large amounts of patch-level annotated data, which is costly and time-consuming to obtain in clinical settings. In practice, only patient-level or slide-level diagnoses are routinely available, motivating the development of weakly supervised learning approaches. In this work, we implement and evaluate an attention-based Multiple Instance Learning (MIL) framework for cancer detection on the PatchCamelyon (PCam) benchmark — a patch-level dataset derived from the CAMELYON16 whole-slide image challenge for lymph node metastasis detection — using only slide-level labels during training. Patches are grouped into bags of 32, and a lightweight attention network learns to identify the most diagnostically relevant patches without patch-level supervision. We compare our weakly supervised approach against two fully supervised baselines — ResNet50 and Vision Transformer (ViT-B/16) fine-tuned on patch-level labels. Our attention-based MIL achieves an AUC of 0.9860, outperforming ResNet50 (AUC: 0.9137) and ViT-B/16 (AUC: 0.9478) despite using significantly weaker supervision. Attention visualisations reveal that the model learns to focus on clinically meaningful tissue architecture features, demonstrating interpretability without requiring pixel-level annotations. These results suggest that weakly supervised MIL is a promising and practical approach for cancer detection in resource-constrained clinical environments.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper