Large language models (LLMs) are increasingly deployed for multilingual information retrieval and reasoning over very long documents, yet they often struggle with extracting dispersed facts and synthesizing robust answers across linguistic boundaries. In this work, we propose a hybrid neural-symbolic framework that integrates scalable cross-lingual retrieval with explicit symbolic reasoning. Our approach, CROSS (Cross-lingual Retrieval Optimized for Scalable Solutions), efficiently narrows massive multilingual contexts using multilingual embeddings, dramatically improving retrieval accuracy and mitigating the “lost-in-the-middle” problem. Building on this, we introduce NeuroSymbolic Augmented Reasoning (NSAR) , which prompts LLMs to extract structured facts and generate executable Python code, enabling deterministic and interpretable multitarget reasoning. We evaluate our methods on the mLongRR-V2 benchmark, spanning seven languages, 49 cross-lingual pairs, and documents up to 512,000 words. Our experiments show that compared to neural-only baselines, CROSS boosts a retrieval accuracy of up to 92% and NSAR reduces reasoning failures fivefold, while maintaining stable performance across languages and context sizes. These results establish a new standard for robust, scalable, and interpretable multilingual information extraction, demonstrating the promise of hybrid neural-symbolic architectures for future artificial intelligence systems.
Nezhad et al. (Wed,) studied this question.