What question did this study set out to answer?

This research aims to develop benchmark datasets and assess large language models for matching dietary data with food composition databases.

June 20, 2026Open Access

Evaluation of Large Language Models for Mapping Dietary Data to Food Databases

Key Points

This research aims to develop benchmark datasets and assess large language models for matching dietary data with food composition databases.
Developed two benchmark datasets: ASA24-to-FooDB and NHANES-to-DFG2.
Tested various matching methods including fuzzy matching, TF-IDF, semantic embedding, and LLMs.
Implemented a hybrid approach combining semantic mapping with LLM reranking.
Semantic embedding achieved 87.8% accuracy for ASA24-to-FooDB and 48.0% for NHANES-to-DFG2.
LLMs showed 62.6% accuracy on NHANES-to-DFG2 but performed worse on ASA24-to-FooDB.
The hybrid approach resulted in overall accuracies of 90.7% for ASA24-to-FooDB and 65.4% for NHANES-to-DFG2.

Abstract

BACKGROUND: New food databases increasingly provide biochemical information not yet captured in standard food composition databases (FCDs). To enable precision nutrition, new methods are needed to map foods to these FCDs. OBJECTIVE: We sought to provide real-world ground truth (benchmark) datasets and evaluate the use of large language models (LLMs) to match foods reported in dietary data with foods in FCDs. METHODS: Two ground truth (benchmark) datasets were developed. ASA24-to-FooDB included a large FCD (9,910 entries) with many similar or perfect matches. NHANES-to-DFG2 included a small FCD (256 entries) with imperfect matches or "No Match" (46.9%). Matching methods tested included fuzzy matching, TF-IDF, semantic embedding, and LLMs. RESULTS: Food text description mapping using similarity scores from semantic embedding performed better on both ground truth datasets (87.8% accuracy, ASA24-to-FooDB; 48.0% accuracy, NHANES-to-DFG2) than fuzzy matching or TF-IDF. LLMs performed worse on ASA24-to-FooDB when given the entire FCD, but better on NHANES-to-DFG2 (62.6% accuracy). For foods where a correct match exists, semantic similarity yielded top K accuracies of 85% at k=5, 95% at k=10 for ASA24-to-FooDB and 96% at k=5, 98% at k=10 for NHANES-to-DFG2. A hybrid approach using semantic embeddings to select the top K matches to prompt LLMs yielded overall accuracies of 90.7% on ASA24-to-FooDB and 65.4% on NHANES-to-DFG2. An investigation of different prompt strategies and model sizes demonstrated that simpler prompts worked better for larger LLMs while smaller LLMs needed detailed instructions. To assist nutrition scientists, the best strategy (semantic mapping + LLM reranking) was implemented in an application: FoodMapper (https://foodmapper.app/). CONCLUSIONS: To match food text descriptions to FCDs, identifying top matches using semantic similarity followed by an LLM to choose from among those matches or "no match" resulted in the highest accuracy. FoodMapper provides users with the best solution in a user-friendly interface that facilitates manual review.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper