Abstract Branded food data are essential for assessing contemporary dietary behavior and the global food environment. However, processing such data is challenging due to its vast, rapidly changing nature, variable quality, and numerous sources. To address these limitations, we developed a fully automated, large language model (LLM)-powered pipeline for collecting, standardizing, and enriching branded food data, enabling ingredient-level analyses, and facilitating estimation of ingredient quantities and undeclared nutrient content. Evaluation of LLM performance demonstrated that a fine-tuned model outperformed the human experts in parsing and mapping product data. Non-fine-tuned LLMs showed insufficient performance, whereas even modest amounts of fine-tuning data substantially improved results. Overall, LLMs provide a scalable approach for processing branded food data and supporting more consistent and standardized data curation, with performance exceeding that of an individual human expert. These results highlight the potential of LLMs to transform the management and analysis of complex, large-scale food databases.
Hauff et al. (Tue,) studied this question.