The peer review process serves as a critical gatekeeper in scholarly communication; it provides constructive feedback, determines the credibility of research, and validates the scientific claims and overall quality of research papers. However, human reviews are often subjective and inconsistent. Due to the voluntary nature of the reviewing task, reviewers may not always devote time to thoroughly evaluating manuscripts. The peer review process remains vulnerable to bias and lackluster evaluations. Recent advancements in Large Language Models (LLMs) offer a promising testbed for their potential for automating or augmenting the peer review process that can complement or benchmark human reviewers. However, the potential of large language models (LLMs) remains unexplored regarding the extent to which these models can replicate human evaluation, particularly in terms of critical depth, reasoning accuracy, and alignment with human decision-making. To test this hypothesis, In this paper, we introduce Co-Reviewer , an agentic AI framework composed of four specialized agents that work together to generate, evaluate, critique, and refine peer reviews. Additionally, we conduct a multi-dimensional evaluation comparing LLM-generated reviews with human-written reviews, using evaluation metrics such as content informativeness, sentiment polarity and variability, score consistency, and alignment with final editorial decisions. Our research shows that while LLMs can create well-written and clear reviews, they have consistent problems like sounding too confident, favoring acceptance, and struggling to adjust to changes in manuscripts. Additionally, LLMs often confuse linguistic fluency with substantive critique, missing the nuanced and context-sensitive reasoning found in expert human assessments. To address these limitations, we propose several enhancements: domain-adaptive fine-tuning on peer review datasets, structured aspect-based critique generation, sentiment modulation for more calibrated feedback, and hybrid pipelines that combine LLM outputs with human oversight. Our work contributes to the growing body of research on AI-assisted scholarly evaluation and underscores both the potential and the limitations of using LLMs as Co-Reviewer in academic publishing workflows. The dataset and code that replicate our findings are publicly available at https://github.com/PrabhatkrBharti/Co-Reviewer.git .
Building similarity graph...
Analyzing shared references across papers
Loading...
Prabhat Kumar Bharti
Viral Dalal
Mihir Panchal
Scientometrics
Bennett University
Dwarkadas J. Sanghvi College of Engineering
Building similarity graph...
Analyzing shared references across papers
Loading...
Bharti et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69926a620d0ce0adc9976a20 — DOI: https://doi.org/10.1007/s11192-026-05557-6