What question did this study set out to answer?

The goal is to assess the performance of a passive dietary assessment system using LVLMs in real-world settings.

June 9, 2026

Enabling dietary assessment with smart wearables: a passive, scalable approach for precision nutrition

Key Points

The goal is to assess the performance of a passive dietary assessment system using LVLMs in real-world settings.
Feasibility study conducted with 30 participants at two centers (Hammersmith and Reading) wearing customised cameras.
Participants consumed two standardized diets over four days while monitored for dietary intake.
The system involved image capture, privacy measures, and automated analysis of eating episodes and portion sizes.
Food-item recall from captured imagery was 82% (95% CI 81–84%).
Mean absolute error for portion-size estimation was 44.7 g (95% CI 42.2–47.3 g) for food and 70.4 mL (95% CI 67.4–73.4 mL) for beverages.
Foundational evidence supports the potential for LVLM-enabled dietary monitoring and further validation for precision nutrition.

Abstract

Self-reported dietary assessments have long been a limiting factor in advancing the field of precision nutrition. This is due to challenges such as unrecorded eating episodes, recall bias and portion-size estimation errors (1) . We developed a system using customised wearable cameras and reasoning-enabled large vision–language models (LVLMs) to create a fully automated pipeline facilitating scalable and objective dietary assessment. Beyond reducing user burden, passive capture can record brief or opportunistic eating episodes that are typically missed, while the use of LVLMs improves identification across heterogeneous contexts. However, the feasibility, privacy safeguards, and quantitative performance of such systems remain underexplored. This study aims to evaluate the LVLM-enabled passive system’s performance in real-world deployments, focusing on its ability to accurately capture and analyze dietary intake. A feasibility study was conducted at two centres, Hammersmith Hospital and the University of Reading (2) , where thirty UK participants wore customised cameras side-mounted on glasses (STM32 microcontroller; 128-GB SD card; rechargeable) throughout waking hours whilst consuming two highly-controlled, standardised diets; one of which was compliant with UK healthy eating guidelines and the other was not. Each diet was consumed over four study days, during which participants remained in the facility and consumed meals provided by the study team. The model outputs were benchmarked against a dietitian-verified reference menu and the weights of food portions consumed. The preprocessing pipeline was first applied to blur faces and screens in captured images for privacy protection. The LVLM-based pipeline then performed three tasks: (i) extracting eating episodes; (ii) recognising food items across heterogeneous settings; and (iii) context-aware portion-size estimation, using cues from containers, utensils, and hands to mitigate monocular visual scale ambiguity (3) . Any eating sessions lacking captured images were excluded from subsequent analyses. Data passively captured with wearable cameras from 30 participants (Hammersmith, n=15; Reading, n=15) over eight study days yielded 2.08 million raw images at Hammersmith and 2.15 million at Reading. After privacy filtering and removal of redundant frames, 0.49% and 0.46% of images were retained from each site, respectively. Overall, food-item recall from passively captured imagery was 82% (95% CI 81–84%). Portion-size estimation showed a mean absolute error of 44.7 g (95% CI 42.2–47.3 g) for food items and 70.4 mL (95% CI 67.4–73.4 mL) for beverages against weighed consumed portions. This feasibility study provides foundational evidence for LVLM-enabled, passive, camera-based dietary monitoring and supports progression to real-world deployment. These feasibility results support further multi-site validation, inclusion of metrics beyond recall (e.g., energy and macronutrient assessment), and assessment of performance across settings (home vs out-of-home) and subgroups to capture nutrient intake at population-level and enhance precision nutrition approaches.

Bookmark

View Full Paper

Cite This Study

Lo et al. (Fri,) studied this question.

synapsesocial.com/papers/6a27ae3fa963992e16268525 https://doi.org/https://doi.org/10.1017/s0029665126102936

Bookmark

View Full Paper