This work introduces the Contextual Ethical Consistency Test (CECT), a bilingual multi-turn benchmark designed to evaluate ethical consistency in large language models under contextual pressure. The benchmark examines whether normative baselines persist across trajectory perturbations including contextual drift, stake inversion, conversational reset, and cross-language variation. It is applied to nine commercial models across Spanish and English. The study defines reproducible metrics including Drift Index (ID), Reversibility Index (IR), Narrative Consistency Index (ICN), Cross-Language Index (ICL), and Stake Sensitivity Index (ISS). Results show that ethical consistency cannot be inferred from isolated responses, but must be evaluated as a trajectorial property. This work contributes a methodological and analytical framework for studying ethical robustness and susceptibility to contextual reconfiguration in conversational AI systems.
Evans Tovar (Thu,) studied this question.