What question did this study set out to answer?

June 12, 2026Open Access

A CLIP-based framework for multiclass lung histopathology classification with prompt engineering and class-imbalance-aware focal optimization

Key Points

This study aims to develop an effective framework for multiclass lung histopathology classification using a CLIP-based approach.
Implemented a CLIP ViT-B/32 backbone with domain-specific prompt engineering for classification.
Organized dataset into training, validation, and testing splits with 3,500 training images and 500 validation images per class.
Utilized techniques like focal loss, AdamW optimization, and early stopping during training.
Achieved a maximum validation accuracy of 95.20% after 23 epochs of training.
Reported a macro AUC of 0.9870 and a micro AUC of 0.9877, indicating strong classification performance.
Demonstrated consistent performance improvement across training epochs.

Abstract

Lung cancer remains one of the leading causes of cancer-related mortality worldwide, and accurate histopathological classification is essential for timely diagnosis and treatment planning. This study presents a Contrastive Language Image Pretraining (CLIP)-based framework for multiclass lung histopathology classification, designed to distinguish among benign lung tissue, lung adenocarcinoma, and lung squamous cell carcinoma. The proposed approach leverages a pretrained CLIP ViT-B/32 backbone, domain-specific prompt engineering, multimodal image text pairing, and similarity-based classification within a shared embedding space. To strengthen convergence and robustness during fine-tuning, the training pipeline incorporates data augmentation, Focal Loss, AdamW optimization, OneCycle learning rate scheduling, mixed-precision training, gradient clipping, and early stopping. The dataset is organized into separate training, validation, and testing splits, with the reported training and validation partitions containing 3,500 and 500 images per class, respectively. Experimental training on a Tesla T4 GPU demonstrated steady performance improvement across epochs, with the best validation accuracy reaching 95.20%, accompanied by a macro AUC of 0.9870 and a micro AUC of 0.9877, before early stopping was triggered at epoch 23. These findings indicate that integrating CLIP with pathology-specific text prompts provides a strong and reliable framework for automated lung cancer histopathology classification, with promising potential for future intelligent digital pathology systems.

Mark Helpful

Bookmark

Relay

View Full Paper