Learning-based techniques show promise for automating software development tasks, but current approaches treat context in an ad hoc manner. Existing techniques select context through arbitrary heuristics, such as fixed token windows, enclosing methods, or entire files, without systematically analyzing which contextual information is relevant for a given task. The goal of the work presented in this dissertation is to systematically leverage different forms of context to improve the effectiveness of AI-assisted software development. First, we present a graph-to-sequence learning approach that captures semantic context through program analysis. By encoding control-flow and data-flow dependencies into a fine-grained graph representation, our approach outperforms state-of-the-art baselines for program repair. Second, we develop a retrieval-based technique for selecting demonstration examples during few-shot prompting. By automatically retrieving relevant examples, our approach outperforms task-specific and fine-tuned models on test assertion generation and program repair. Third, we develop an automated technique for generating issue-reproducing tests from natural language bug reports. Our technique successfully generates reproducing tests for real-world issues, including cases uniquely solved by our approach that were missed by all prior work. Fourth, we characterize the complexity of multi-hunk patches through empirical analysis of real-world bugs. We introduce hunk divergence and spatial proximity metrics that quantify variation among hunks and dispersion across code. Our evaluation reveals that repair accuracy declines sharply with increased divergence, exposing fundamental limitations in how current models reason over dispersed code. Finally, we conduct the first automated systematic study of coding agents on multi-hunk repair. Our findings reveal substantial variation in localization capability and repair accuracy, with high-performing agents significantly outperforming lower-performing ones. Collectively, these contributions demonstrate that the choice of contextual information plays a significant role in the effectiveness of AI-assisted software development. The results show that our techniques accomplish the stated research goals.
Noor Nashid (Thu,) studied this question.