What type of study is this?

This is a Quantitative Study study.

October 12, 2025Open Access

Pt-HotpotQA: Evaluating Multi-Hop Question Answering on Original and Portuguese-translated Datasets Using LLMs

Key Points

Multilingual models show significantly better performance in English than Portuguese, highlighting language-specific challenges.
Fine-tuning large language models leads to improved multi-hop question answering results in Portuguese datasets.
The evaluation utilizes the HotpotQA benchmark, providing a comprehensive view on LLMs and multilingual question answering.
Findings suggest a narrowing performance gap in Portuguese with increased model size, underscoring the need for larger models in this context.

Abstract

Multi-hop Question Answering (MHQA) advances Natural Language Processing by pushing models to combine information from multiple sources in a series of reasoning steps. Despite substantial advancements in MHQA for English, resources for evaluating Large Language Models (LLMs) in Portuguese remain scarce. To address this gap, we introduce a publicly available Portuguese translation of the HotpotQA dataset, a well-established English MHQA benchmark. We systematically evaluate several variants of the Llama multilingual LLM across both the original and translated datasets, analyzing performance variations by language. Our findings demonstrate that multilingual models consistently perform better in English than in Portuguese, though this gap narrows with increased model size. Additionally, we show the impact of fine-tuning on improving MHQA performance in Portuguese. This study provides valuable insights into optimizing LLMs for multilingual contexts and contributes a relevant benchmark for Portuguese-language MHQA research.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Mucciaccia et al. (Mon,) studied this question.

synapsesocial.com/papers/68ebe3d6becc64ad52fdaee7 https://doi.org/https://doi.org/10.5753/jbcs.2025.5801

Bookmark

View Full Paper