Information Extraction (IE) transforms unstructured documents into structured information and is essential for accelerating data-driven materials discovery. However, materials science literature poses unique challenges for IE because it contains specialized terminology, ambiguous material expressions, complex experimental descriptions, and heterogeneous data formats. Recently, Large Language Models (LLMs) have shown strong potential for extracting and organizing material knowledge, but their effectiveness for domain-specific IE tasks and their ability to address persistent extraction challenges remain insufficiently synthesized. This survey provides a systematic review of LLM-based IE in materials science. Unlike broader reviews on LLMs for materials discovery, this survey focuses specifically on the information-extraction layer that converts unstructured materials literature into structured and reusable knowledge. We introduce a taxonomy of key IE tasks, ranging from Named Entity Recognition (NER) and Relation Extraction (RE) to multi-modal extraction, classification, and generative data structuring. We then review the major methodological paradigms, including prompting, finetuning, retrieval-augmented generation, and ensemble or agent-based methods, together with domain-specific datasets, pretrained and foundation models, tools, and evaluation protocols. Overall, this survey organizes the field into six task categories, four methodological paradigms, three resource groups, three representative application scenarios, and six major challenge areas. Finally, we identify six main challenges for future research: (1) inconsistent material terminology, (2) uneven domain coverage across materials subfields, (3) complex multi-modal data integration, (4) domain-specific data scarcity, (5) practical deployment issues of LLM-based IE, and (6) the need for rigorous, fact-checking evaluation protocols. This work provides a structured guide for developing reliable LLM-based IE pipelines for materials science.
Duan et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: