Cancer remains one of the leading causes of death worldwide, with diagnosis and treatment fundamentally relying on histopathological examination of tissue samples. This thesis advances data-driven histopathology along two complementary axes: (i) methodological innovations for weakly supervised whole slide image classification (WSWSIC) and (ii) agentic frameworks that democratise access to such methods. First, I introduce HoechstGAN, a generative model that jointly synthesises multiple immunofluorescence stains (CD3, CD8) from a single inexpensive Hoechst stain, exploiting the biological relation between CD3⁺ and CD8⁺ cells. Second, I systematically reassess long-standing assumptions in WSWSIC. In a large-scale benchmarking study involving over 10,000 trained models, I demonstrate that the choice of feature extractor is the dominant driver of downstream performance, while stain normalisation – an expensive preprocessing step that has been standard practice for over two decades – yields no consistent benefit with modern self-supervised pathology foundation models. My analysis of these models' latent spaces reveals that self-supervised pathology encoders are inherently robust to image transformations. Furthermore, I find that extracting features at lower magnification delivers quadratic memory and compute savings while maintaining classification accuracy. Third, I introduce DasMIL, a multiple instance learning architecture with a novel distance-aware self-attention mechanism that encodes relative spatial relationships between patches, improving slide-level prediction over spatially agnostic baselines. Finally, I show the way forward towards autonomous AI scientists by demonstrating that LLM agents can autonomously transform research papers with associated code repositories into executable, LLM-compatible tools. I propose an agentic framework, ToolMaker, that achieves 80% task success across 15 scientific tasks and enables natural-language access for non-programmers. Collectively, these contributions chart a path from optimising individual components in computational pathology pipelines towards developing agentic systems that broaden the reach of computational pathology and adjacent fields by providing building blocks for autonomous scientific discovery.
Georg Alexander Wölflein (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: