What question did this study set out to answer?

This evaluation aims to assess the performance of several LLMs in generating unit tests for Python code.

March 24, 2026Open Access

Leveraging Large Language Models (LLM) for Python Unit Test

Key Points

This evaluation aims to assess the performance of several LLMs in generating unit tests for Python code.
Evaluated six advanced Large Language Models (LLMs) for their code generation capabilities.
Tested each model’s ability to produce production-quality Python code.
Analyzed the comprehensiveness of unit tests generated alongside the code.
All evaluated LLMs demonstrated varying levels of capability in generating Python code.
Certain LLMs outperformed others in generating comprehensive unit tests.
Quality of code and tests varied, indicating a need for careful model selection.

Abstract

Abstract This study evaluates the capability of six state-of-the-art Large Language Models (LLMs): Perplexity AI, Claude Sonnet 4.5, Gemini 2.5 Pro, ChatGPT (GPT-5), DeepSeek-V3.2-Exp, and Llama-4-Maverick, to generate production-quality Python code with comprehensive unit tests.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jiri Medlen

Emese Bari

Devarshi Tank

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Medlen et al. (Wed,) studied this question.

synapsesocial.com/papers/69c2298daeb5a845df0d431b — DOI: https://doi.org/10.5281/zenodo.19170304

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

On the Evaluation of Large Language Models in Unit Test Generation· 2024 · 2 citations
Large Language Models as Test Case Generators: Performance Evaluation and Enhancement· 2024 · 8 citations
Unit Test Generation Using Large Language Models: A Systematic Literature Review· 2024 · 5 citations
Exploring Advanced Large Language Models with LLMsuite· 2024 · 1 citations
Automated Test Generation Using Large Language Models· 2025 · 2 citations

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

On the Evaluation of Large Language Models in Unit Test Generation· 2024 · 2 citations
Large Language Models as Test Case Generators: Performance Evaluation and Enhancement· 2024 · 8 citations
Unit Test Generation Using Large Language Models: A Systematic Literature Review· 2024 · 5 citations
Exploring Advanced Large Language Models with LLMsuite· 2024 · 1 citations
Automated Test Generation Using Large Language Models· 2025 · 2 citations

Leveraging Large Language Models (LLM) for Python Unit Test

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider