The human face and tongue each carry distinct visual signatures of nutritional and systemic health status. This paper introduces the first computational system to combine both modalities in a single probabilistic framework for nutritional deficiency screening. Building on a previously published face analysis pipeline, we develop a complete tongue analysis module using a YOLOv8m detector trained on a unified 9,125-image tongue dataset across 12 feature classes (mAP@0.5 = 0.812), region-aware DINOv2 ViT-S/14 feature extraction from five anatomical tongue zones producing 1920-dimensional embeddings, a severity regression MLP with Monte Carlo Dropout uncertainty quantification (mean F1 = 0.800), and a calibrated tongue-specific Bayesian inference engine. The tongue module introduces four deficiency categories not detectable from skin alone: liver stress, gut dysbiosis, hypothyroid tendency, and folate deficiency - extending diagnostic coverage from 11 to 15 categories (36% increase). Both modalities are combined through a weighted product-of-experts posterior fusion (face weight α=0.55, tongue weight β=0.45). Ablation experiments across six fusion configurations confirm that face and tongue are genuinely complementary modalities, with only 48.9% top-1 agreement between modalities. The full combined pipeline processes a face and tongue image pair in under 120 ms on an NVIDIA RTX 4070 Super.
Building similarity graph...
Analyzing shared references across papers
Loading...
Abdul Moiz Muhammad
COMSATS University Islamabad
Building similarity graph...
Analyzing shared references across papers
Loading...
Abdul Moiz Muhammad (Sat,) studied this question.
www.synapsesocial.com/papers/69d1fe18a79560c99a0a49d5 — DOI: https://doi.org/10.5281/zenodo.19411125