What question did this study set out to answer?

This research aims to develop a computational system that integrates facial and lingual biomarkers for enhanced nutritional deficiency screening.

April 5, 2026Open Access

Multi-Modal Visual Health Assessment Through Product-of-Experts Posterior Fusion of Facial and Lingual Biomarkers

Key Points

This research aims to develop a computational system that integrates facial and lingual biomarkers for enhanced nutritional deficiency screening.
Developed a tongue analysis module using a YOLOv8m detector on a 9,125-image dataset.
Implemented region-aware DINOv2 ViT-S/14 feature extraction producing 1920-dimensional embeddings from tongue zones.
Utilized severity regression MLP with Monte Carlo Dropout for uncertainty quantification.
Combined facial and tongue data through weighted product-of-experts fusion.
Introduced four new deficiency categories: liver stress, gut dysbiosis, hypothyroid tendency, and folate deficiency.
Achieved a 36% increase in diagnostic coverage from 11 to 15 categories.
Confirmed low agreement (48.9%) between facial and tongue modalities, validating their complementary nature.
Processed images in under 120 ms using an NVIDIA RTX 4070 Super.

Abstract

The human face and tongue each carry distinct visual signatures of nutritional and systemic health status. This paper introduces the first computational system to combine both modalities in a single probabilistic framework for nutritional deficiency screening. Building on a previously published face analysis pipeline, we develop a complete tongue analysis module using a YOLOv8m detector trained on a unified 9,125-image tongue dataset across 12 feature classes (mAP@0.5 = 0.812), region-aware DINOv2 ViT-S/14 feature extraction from five anatomical tongue zones producing 1920-dimensional embeddings, a severity regression MLP with Monte Carlo Dropout uncertainty quantification (mean F1 = 0.800), and a calibrated tongue-specific Bayesian inference engine. The tongue module introduces four deficiency categories not detectable from skin alone: liver stress, gut dysbiosis, hypothyroid tendency, and folate deficiency - extending diagnostic coverage from 11 to 15 categories (36% increase). Both modalities are combined through a weighted product-of-experts posterior fusion (face weight α=0.55, tongue weight β=0.45). Ablation experiments across six fusion configurations confirm that face and tongue are genuinely complementary modalities, with only 48.9% top-1 agreement between modalities. The full combined pipeline processes a face and tongue image pair in under 120 ms on an NVIDIA RTX 4070 Super.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Abdul Moiz Muhammad

Actions

Institutions

COMSATS University Islamabad

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Multi-Modal Visual Health Assessment Through Product-of-Experts Posterior Fusion of Facial and Lingual Biomarkers

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study