Key points are not available for this paper at this time.
The development of vision-language models (VLMs) for histo-pathology has shown promising new usages and zero-shot performances. However, current approaches, which decompose large slides into smaller patches, focus solely on inductive classification, i. e. , prediction for each patch is made independently of the other patches in the target test data. We extend the capability of these large models by introducing a transductive approach. By using text-based predictions and affinity relationships among patches, our approach leverages the strong zero-shot capabilities of these new VLMs without any additional labels. Our experiments cover four histopathology datasets and five different VLMs. Operating solely in the embedding space (i. e. , in a black-box setting), our approach is highly efficient, processing 10⁵ patches in just a few seconds, and shows significant accuracy improvements over inductive zero-shot classification. Code available at https: //github. com/FereshteShakeri/Histo-TransCLIP.
Zanella et al. (Tue,) studied this question.