Computer-aided diagnosis (CAD) systems are used in the medical field to assist clinicians in interpreting diseases. CAD can analyze text, audio, and medical images, offering high accuracy and efficiency in medical diagnoses, particularly when employing deep-learning artificial intelligence. However, due to deep learning’s black-box nature, concerns about interpretability and reliability have raised questions about patient and physician confidence in AI-guided clinical diagnoses. As a result, research has increasingly focused on performance and improving trust and transparency. This study proposes a preliminary Trustworthiness Indicator () to quantify reliability and trustworthiness numerically. The Binary Melanoma Detection Classification on dermoscopic images and the Multimodal Diabetic Retinopathy grading problems on ocular fundus images are used to experiment and analyze the behaviors and performance of. The performances were compared with standard metrics to explore potential correlations, weaknesses, and robustness.
Biasi et al. (Wed,) studied this question.