The increasing usage of large language models for code generation raises concerns regarding their computational costs and ecological impact. This study evaluates the environmental efficiency of several cutting-edge large language models, including ChatGPT, Claude, Copilot, DeepSeek, Gemini, Mistral, and Qwen, across algorithm and data structure tasks in Python, C++, and Java, selected from HackerRank to ensure practical relevance. A multi-metric, sustainability-focused evaluation framework is proposed, measuring execution time, peak memory usage, energy consumption, and carbon footprint. The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) is applied to combine algorithm and data structure metrics into scores for each programming language, which are then normalized across models and averaged across languages to compute the GreenAI Efficiency Score. This unified score enables fair, comprehensive ranking of models, promoting environmentally responsible AI selection in software development.
Tunzina et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: