What type of study is this?

This is a Literature Review study (also classified as: Observational).

August 26, 2025Open Access

Evaluating Agentic AI Systems: A Balanced Framework for Performance, Robustness, Safety and Beyond

Key Points

Agentic AI systems can yield productivity gains of 20–60%, yet often lack assessments of fairness and trust.
The proposed evaluation framework covers capabilities, robustness, safety, and economic sustainability among others.
Multidimensional evaluation combines automated metrics with human evaluations for a more comprehensive assessment.
This approach supports the responsible adoption of agentic AI in high-stakes domains, addressing overlooked sociotechnical dimensions.

Abstract

Agentic artificial intelligence (AI)—multi-agent systems that combine large language models with external tools and autonomous planning—are rapidly transitioning from research labs into high-stakes domains. Existing evaluations emphasise narrow technical metrics such as task success or latency, leaving important sociotechnical dimensions like human trust, ethical compliance and economic sustainability under-measured. We propose a balanced evaluation framework spanning five axes (capability&efficiency, robustness& adaptability, safetyðics, human-centred interaction and economic&sustainability) and introduce novel indicators including goal-drift scores and harm-reduction indices. Beyond synthesising prior work, we identify gaps in current benchmarks, develop a conceptual diagram to visualise interdependencies and outline experimental protocols for empirically validating the framework. Case studies from recent industry deployments illustrate that agentic AI can yield 20–60 % productivity gains yet often omit assessments of fairness, trust and long-term sustainability. We argue that multidimensional evaluation—combining automated metrics with human-in-the-loop scoring and economic analysis—is essential for responsible adoption of agentic AI.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper