What does this research mean for the field?

The Dynamic Contextual Responsibility (DCR) framework reveals that approximately 22% of outputs classified as responsible under static metrics are reclassified when contextual and temporal factors are considered, highlighting latent ethical and governance risks in large language models. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The research aims to develop a framework for evaluating AI responsibility that considers dynamic context and governance.

March 10, 2026Open Access

A dynamic contextual responsibility framework for evaluating large language models in socio-technical contexts

Puntos clave

The research aims to develop a framework for evaluating AI responsibility that considers dynamic context and governance.
Introduced the Dynamic Contextual Responsibility (DCR) framework.
Integrated five dimensions: ethical foundations, contextual grounding, behavioural properties, governance mechanisms, and temporal dynamics.
Examined DCR through multi-model and multi-context evaluations using benchmarks like TruthfulQA and FEVER.
About 22% of outputs deemed responsible by static metrics were reclassified when contextual and temporal factors were included.
The analysis uncovered hidden ethical and governance risks in AI outputs.

Resumen

Current Responsible AI metrics, including truthfulness, bias, and toxicity scores, often reduce responsibility in large language models (LLMs) to static technical proxies, obscuring the contextual, ethical, and temporal dynamics through which accountability is produced in real-world settings. This study introduces Dynamic Contextual Responsibility (DCR), a conceptual and operational framework that defines responsibility as a dynamic, context-conditioned, and socio-technical relation shaped by system behaviour, governance arrangements, and institutional norms. DCR integrates five dimensions, ethical foundations, contextual grounding, behavioural properties, governance mechanisms, and temporal dynamics, into a unified and interpretable construct. To illustrate its operational implications, the framework is examined through multi-model, multi-context, and multi-temporal evaluations using established benchmarks such as TruthfulQA, FEVER, and HotpotQA. The analysis shows that approximately 22% of outputs classified as responsible under static metrics are reclassified once contextual and temporal factors are considered, revealing latent ethical and governance risks. By foregrounding context, governance, and temporal change, DCR advances Responsible AI evaluation toward more dynamic, transparent, and plural forms of accountability, with direct relevance for emerging regulatory regimes, including the EU AI Act and the NIST AI Risk Management Framework.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo