What question did this study set out to answer?

The research aims to evaluate the in-context learning abilities of large language models in predicting molecular properties.

January 18, 2026

Evaluating In‐Context Learning in Large Language Models for Molecular Property Regression

Key Points

The research aims to evaluate the in-context learning abilities of large language models in predicting molecular properties.
Assessed seven large language models on molecular property prediction tasks.
Used a controlled framework of 56 transformed tasks to isolate shortcut learning.
Analyzed performance under nonlinear transformations compared to machine learning baselines.
LLMs performed nearly perfectly on basic molecular weight prediction using shortcut cues.
Performance deteriorated significantly under nonlinear transformations compared to stronger ML models.
Meta-analysis identified distributional descriptors and SALI as predictors of task favorability.

Abstract

ABSTRACT Large language models (LLMs) demonstrate strong performance in natural language tasks, but their capacity for genuine in‐context learning (ICL) in scientific regression remains unclear. We systematically assessed seven LLMs on molecular property prediction using a controlled framework of 56 transformed tasks that isolate shortcut learning and are designed to induce functional out‐of‐distribution (OOD) behavior. LLMs performed nearly perfectly on raw molecular weight prediction via shortcut cues but deteriorated under nonlinear transformations, whereas machine learning (ML) baselines showed greater robustness, yielding a performance crossover. Meta‐analysis revealed that distributional descriptors and structure–activity landscape indices (SALI) predict task favorability, providing a framework for selecting between LLM‐ and ML‐based approaches in chemistry.

Bookmark

Evaluating In‐Context Learning in Large Language Models for Molecular Property Regression

Key Points

Abstract

Cite This Study