Abstract The advancement in large language models (LLMs) have increased the existing understanding of numerical, logical, and quantitative reasoning covering every domain. Current study is an attempt to present a comprehensive analysis of OpenAI’s O1 and DeepSeek’s R1, through in-depth evaluation covering reasoning capabilities, computational efficiency, and ethical consideration in academic settings. The research used benchmark evaluations (MMLU, MATH, AIME and HumanEval) to test the performance. The result found that OpenAI’s O1, with its dense transformer and Chain-of-thought (CoT) framework, is better suited for human evil and MBPP. DeepSeek’s R1, using a Mixture-of-Experts (MoE) prompts, was more efficient in MATH and AIME applications for STEM. The study further showed that OpenAI proprietary model reduces bias using reinforcement learning, while the DeepSeek framework uses content restrictions following the guidelines of regional regulations. The insights from this study can guide general decision-making in the use of AI models within academic setting while maintaining a balance with task related performance.
Chakraborty et al. (Thu,) studied this question.