The formal linguistic capabilities of Large Language Models (LLMs) are increasingly intersecting with the field of Explainable Agency (XAg). The growing adoption of LLM-agents has heightened the need to explain their behaviour to users. However, current methods consistently fail to meet desirable properties of human-centric explanations. Moreover, language models are being used to improve traditional agent explainability methods due to their conversational abilities and theirresemblance to human responses, yet are often used without sufficient consideration of their limitations. In this paper, we show how the inclusion of LLM components in the agent architecture affects the reliability of the explanations produced. We argue that the widespread reliance on LLMs in XAg is polluting the definition of explanation and explainability. We question current methods of agentic explainability and argue that their risks may undermine trust. Through an architectural analysis and an empirical illustration, we highlight how certain design choices may limit the kinds of explanatory queries that can be reliably answered. Finally, we propose what human-oriented explainability should entail in LLM-agents, and we expose the limitations and opportunities of LLM s’ integration into agent explainability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sara Montese
Sergio Alvarez-Napagao
Victor Gimenez-Abalos
Universitat Politècnica de Catalunya
Barcelona Supercomputing Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Montese et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69f2f1dc1e5f7920c6387771 — DOI: https://doi.org/10.5281/zenodo.19353067