Parent-child scaffolding interactions are foundational yet fleeting, making self-monitoring difficult for caregivers. We present GABRIEL, a multi-agent generative AI (GenAI) pipeline conceived as a reflective mirror to augment parental agency in dialogical book-sharing. This work is a proof-of-concept that establishes feasibility, traceability, and preliminary reliability rather than a fully validated monitoring tool. The evaluation uses 52 recordings from 42 predominantly mother-child dyads in a single Colombian region, which constrains generalizability. Agreement with a human-coded benchmark (trained undergraduate raters under supervision) is reported against a priori criteria defined in the Methods: exact and within-±1 agreement on a 1-9 scale, ordinal concordance (Kendall’s b /Spearman’s), and calibration (slope/intercept). While exact and within-±1 agreement remain modest, the system’s evolution reveals a critical trade-off: Stage 2 achieves the lowest point-wise error (MSE), whereas Stage 3 offers a more theoretically grounded architecture at the cost of slightly higher error. This tension is central to our findings. Ordinal concordance and calibration are mixed and generally below our a priori thresholds; we therefore restrict claims to cautious, reflective (non-judgmental) use and highlight the need for further calibration. We also outline system governance, minimal computational requirements, and pathways for cultural/linguistic adaptation. Future work will benchmark against expert developmental psychologists and conduct cross-cultural replications to consolidate measurement validity. A related external validation study is currently under review (blinded) and is not reported here.
Amorocho et al. (Mon,) studied this question.