Log statements play a critical role in modern software development, capturing essential runtime information necessary for software maintenance. Recently, new techniques have been developed to automate logging activities, allowing log statements to be injected into code by identifying specific code locations, selecting the appropriate log level, and generating meaningful log messages that describe the behavior being logged. Although automated logging in production code has attracted significant attention, little focus has been given to the injection of logs in test code. To fill this gap, we conduct an empirical study on 5,206,759 Java test methods collected from 6,405 GitHub projects to explore and disclose the effectiveness and limitations of Pre-trained Language Models (PLMs) and Large Language Models (LLMs) for generating and injecting test log statements. Our findings demonstrate that general-purpose LLMs like GPT-3.5-Turbo, when properly instructed to inject logging statements in test methods, performs comparably to the best-performing PLMs on predicting log level. Additionally, GPT-3.5-Turbo substantially outperforms the best in PLMs on predicting log position with a 33.97% improvement while also achieving superior performance in predicting log messages in terms of BLEU and ROUGE . This work takes the first step toward evaluating the capability of PLMs and LLMs to generate test log statement. This work takes the first step toward evaluating the capability of PLMs and LLMs to generate test log statements. To facilitate future research, we have open-sourced all data and source code used in this work.
Shu et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: