What question did this study set out to answer?

The aim is to investigate how well large language models recognize document structure and its effect on their performance in various tasks.

May 1, 2026Open Access

Document Structure in Large Language Models

Key Points

The aim is to investigate how well large language models recognize document structure and its effect on their performance in various tasks.
Evaluated Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, and GPT-4o mini models.
Presented documents in different input formats: plain text, HTML, Markdown, LaTeX.
Analyzed performance on structure understanding and downstream tasks.
LLMs can develop structural intuition without explicit structure; accuracy is improved with structured inputs.
Explicit structure enhances performance in evidence selection tasks, but benefits are limited in question answering and summarization.

Abstract

Long documents are crucial in knowledge transfer and typically employ a structured organization to facilitate comprehension. While LLMs can process long texts, it is unclear to what extent they recognize and utilize this structural information. Previous research has shown that document structure can improve downstream task performance in pre-trained language models, but its effect on LLMs remains underexplored. In this thesis, we systematically investigate the ability of LLMs to understand document structure and the impact of explicit structural information on downstream task performance. To this end, we evaluated Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, and GPT-4o mini models by presenting documents in different input formats (plain text, HTML, Markdown, LaTeX) and analyzing their performance for structure understanding and downstream tasks. Our experimental results showed that LLMs can develop a structural intuition without explicit structural information; however, structured inputs significantly improve model accuracy in structure understanding tasks. The impact of incorporating explicit structure in documents differed across downstream tasks: While it provided a clear advantage in evidence selection, its benefits were more limited in question answering and summarization tasks.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper