Contemporary hackathon adjudication is burdened by four structural deficiencies inherent to human- centric evaluation: inconsistent rubric application, substantial inter-rater score variance, prohibitive assessment latency, and a near-total absence of granular, actionable post-event diagnostic feedback. This paper introduces HackEval, a production-grade multi-agent artificial intelligence framework designed to systematica ly resolve these limitations through real-time, bias-mitigated evaluation across heterogeneous project submission modalities. Six functionally specialized agents operate in parallel: (i) a Code Quality Agent performing deep multi- criterion static analysis on GitHub repositories; (ii) a Presentation Analyzer Agent realizing a four-stage pipeline integrating LLM semantic reasoning with contrastive vision-language embeddings; (iii) a UI/UX Evaluation Agent leveraging CLIP-based aesthetic regression 13; (iv) an Innovation Agent quantifying originality via semantic embedding distance; (v) a Fea sibility Agent applying chain-of-thought LLM reasoning 12; and (vi) a Plagiarism Detection Agent employing sentence- transformer cosine similarity with FAISS indexing 15. Empirical evaluation across 27 authentic hackathon submissions yields a Pearson correlation of r = 0.93 between AI composite scores and consensus expert evaluations, a 92.8% reduction in per-team evaluation time, and a 77.4% improvement in cross-evaluator scoring consistency. The platform is delivered as a multi-tenant Software-as-a-Service system built on a MERN + FastAPI + LangChain stack, demonstrating concurrent scalability exceeding 500 simultaneous teams with sub-10-second feedback delivery.
Tripathi et al. (Mon,) studied this question.