August 12, 2025

An Empirical Study on Language Models for Generating Log Statements in Test Code

Key Points

MAIN FINDING: Large language models like GPT-3.5-Turbo can effectively generate log statements in test code.
KEY EVIDENCE: GPT-3.5-Turbo outperformed the best pre-trained models on log position prediction by 33.97%.
APPROACH: The study analyzed over 5 million Java test methods from 6,405 GitHub projects.
SIGNIFICANCE: Findings advance automated logging practices, particularly in enhancing software maintenance through improved log generation.

Abstract

Log statements play a critical role in modern software development, capturing essential runtime information necessary for software maintenance. Recently, new techniques have been developed to automate logging activities, allowing log statements to be injected into code by identifying specific code locations, selecting the appropriate log level, and generating meaningful log messages that describe the behavior being logged. Although automated logging in production code has attracted significant attention, little focus has been given to the injection of logs in test code. To fill this gap, we conduct an empirical study on 5,206,759 Java test methods collected from 6,405 GitHub projects to explore and disclose the effectiveness and limitations of Pre-trained Language Models (PLMs) and Large Language Models (LLMs) for generating and injecting test log statements. Our findings demonstrate that general-purpose LLMs like GPT-3.5-Turbo, when properly instructed to inject logging statements in test methods, performs comparably to the best-performing PLMs on predicting log level. Additionally, GPT-3.5-Turbo substantially outperforms the best in PLMs on predicting log position with a 33.97% improvement while also achieving superior performance in predicting log messages in terms of BLEU and ROUGE . This work takes the first step toward evaluating the capability of PLMs and LLMs to generate test log statement. This work takes the first step toward evaluating the capability of PLMs and LLMs to generate test log statements. To facilitate future research, we have open-sourced all data and source code used in this work.

KI fragen

Bookmark