Conversational AI agents are now a routine touchpoint in e-commerce customer service, and AI empathy has emerged as the headline humanization strategy for repairing relational damage during service failures. A growing evidence base reports that empathic AI often backfires, because consumers cannot reconcile felt warmth with their lay model of what an artificial agent is. This research asks under what conditions AI empathy can be made credible to consumers. We propose that mechanistic interpretability, operationalized in the present studies as a consumer-facing visualization of an AI agent’s internal emotion-vector activations designed in the style of mechanistic-interpretability research, operates as a costly authenticity signal that rehabilitates empathic AI by enabling an attributional shift along the experience dimension of mind perception. Signaling Theory carries the antecedent stage of the causal chain, where mechanistic interpretability serves as a verifiable cue of computational authenticity. Mind Perception Theory carries the downstream stage, where the authenticated empathy is converted into consumer-brand intimacy. Two between-subjects experiments preceded by a feasibility pilot tested the account on Mainland Chinese consumers recruited via the Credamo online panel. Study 1 used a single-factor design contrasting high versus low AI empathy. Study 2 used a two (AI empathy) by two (mechanistic interpretability) full factorial. Study 1 showed a pattern consistent with high (versus low) AI empathy lowering brand intimacy through reduced perceived authenticity. Study 2 replicated the AI-empathy backfire when interpretability was absent, reversed the sign of the AI-empathy slope on the perceived-authenticity mediator when interpretability was present, and neutralized the negative conditional indirect effect on brand intimacy through perceived authenticity. The findings introduce mechanistic interpretability to consumer-marketing scholarship as a manipulable signaling channel, document a structural reversal in the mediator-stage slope coupled with neutralization of the indirect effect on the relational outcome, and prescribe pairing empathic AI phrasing with mechanistic-transparency design rather than deploying empathy without an accompanying transparency cue.
Mi et al. (Fri,) studied this question.