The evaluation of handwritten examination scripts is a labour-intensive process that introduces inter-evaluator subjectivity and limits institutional scalability as student enrollment grows. This paper presents ExamAI, a full-stack web application that automates the complete examination lifecycle encompassing scheduling, handwritten answer image submission, optical character recognition (OCR) text extraction, natural language processing (NLP) scoring, and result publication. The scoring engine combines a BERT semantic similarity model weighted at sixty percent with a TF-IDF cosine similarity component weighted at forty percent. The system is implemented as three independently deployable microservices: a React.js frontend, a Node.js Express backend, and a Python Flask machine learning service backed by MongoDB. Empirical scoring analysis across 60 descriptive answer pairs demonstrates that the dual-model approach substantially outperforms TF-IDF-only baselines for paraphrased correct answers, raising combined scores from 10–35% to 37–62%.
Pushadapu et al. (Sat,) studied this question.