Abstract: This study proposes a unified framework for evaluating generative AI–enhanced content by integrating qualitative, quantitative, and mixed-methods designs within a single, audit-ready protocol. Concept analysis distinguishes content quality constructs—factual accuracy, coherence, originality, utility, safety, and equity—and maps them to observable indicators and error taxonomies. Building on this foundation, a layered framework is articulated comprising (i) construct-to-metric alignment with rubric design and codebooks, (ii) quantitative scoring with reliability and validity checks, (iii) qualitative adjudication for nuance and context, and (iv) mixed-methods triangulation that fuses measurements through pre-registered aggregation rules. The central problem addressed is the inconsistency and opacity of current evaluation practices, which impede comparability across tasks, models, and domains. Methodology employs multi-rater protocols with calibration rounds, item-response and generalizability modeling for reliability, bias and harm screens, and causal impact estimation (e.g., randomized or staggered exposure) where outcome effects are measurable. A decision ledger and evidence bundle are specified to ensure traceability and reproducibility. Results from pilot applications indicate improved inter-rater reliability, tighter construct validity, and greater sensitivity to distribution shift compared with baseline rubric-only approaches. Impact arises from standard-compatible reporting, governance-ready artifacts, and procurement-grade comparability across systems. The implications include clearer accountability for content risks and benefits, scalable human-in-the-loop oversight, and a practical route from principle statements to measurable, decision-relevant evidence. Keywords: generative AI evaluation, mixed-methods, qualitative assessment, quantitative metrics, content validity, reliability analysis, triangulation, human-in-the-loop, bias and safety, causal impact, rubric design, auditability, governance and compliance
Murali Krishna Pasupuleti (Sat,) studied this question.