The analysis of volatile organic compounds (VOCs) in exhaled breath has emerged as a promising non-invasive approach for detecting various pathologies, including tuberculosis (TB). The current analysis processes are tedious and labour intensive, because breath samples have thousands of VOCs with varying concentrations. This study explored machine learning classifiers to analyse breath samples from TB patients, multidrug resistant TB patients, and control subjects. The classifiers applied had varying accuracies, with the support vector machine (SVM) emerging as the most accurate model. As a result, the SVM classifier was used for further in-depth studies on retention time intervals that contribute the most variation among the groups. The results show that VOCs that elude in the time interval of 10–30 min (retention time) emerged as the region with the highest distinguishing features when applied to breath samples from TB cohorts. The findings highlight that machine learning models enable fast and efficient GC-MS data processing for VOC detection and analytical profiling. This work highlights the impact and potential of machine learning in the application of disease diagnosis.
Mpolokang et al. (Wed,) studied this question.