Abstract Programmers rely on code documentation and comments to understand source code, with program comprehension tasks consuming a significant portion of development time. Despite their importance, the impact of comments on program comprehension remains debated. Our study addresses this gap by investigating the influence of comments on program comprehension. Employing a mixed-methods approach, we conducted an eye-tracking study involving 20 computer science students to explore the influence of code comments on program comprehension. By analyzing both quantitative and qualitative data, we aim at comprehensively assessing the influence of comments on various aspects of program comprehension. The quantitative data collected consists of behavioral metrics assessing program comprehension in terms of correctness and response time, along with gaze data providing insights into visual attention, linearity of reading order, and gaze strategies. This was complemented by the participants’ ratings on the perceived difficulty and contribution of comments. Additionally, we gathered participants’ experiences through a post-questionnaire, enriching the analysis with qualitative insights into the effectiveness of comments, navigation strategies, and overall experiences with comments. Our findings reveal that the effect of comments on supporting program comprehension varies significantly across code snippets, ranging from a 30% decrease to a 34% increase in performance. Comments significantly guide visual attention, accounting for up to 23% of fixations, and promoted a more linear reading approach. Participants predominantly adhered to a “code-first” strategy. Moreover, comments were rated positively for clarifying complex segments of code and contributing to program comprehension. However, this favorable perception did not consistently translate into improved performance or reduced perceived difficulty across snippets. Based on our findings, we propose avenues for future research, including comparative studies on automated versus human-generated comments and the development of predictive models for assessing comment usefulness.
Abdelsalam et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: