August 2, 2017

Are deep neural networks the best choice for modeling source code?

Key Points

Key points are not available for this paper at this time.

Abstract

Current statistical language modeling techniques, including deep-learning based models, have proven to be quite effective for source code. We argue here that the special properties of source code can be exploited for further improvements. In this work, we enhance established language modeling approaches to handle the special challenges of modeling source code, such as: frequent changes, larger, changing vocabularies, deeply nested scopes, etc. We present a fast, nested language modeling toolkit specifically designed for software, with the ability to add & remove text, and mix & swap out many models. Specifically, we improve upon prior cache-modeling work and present a model with a much more expansive, multi-level notion of locality that we show to be well-suited for modeling software. We present results on varying corpora in comparison with traditional N-gram, as well as RNN, and LSTM deep-learning language models, and release all our source code for public use. Our evaluations suggest that carefully adapting N-gram models for source code can yield performance that surpasses even RNN and LSTM based deep-learning models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Hellendoorn et al. (Wed,) studied this question.

synapsesocial.com/papers/6a0f1f1ea7a2fed64abdbb0e — DOI: https://doi.org/10.1145/3106237.3106290

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Products, developers, and milestones: how should I build my N-Gram language model· 2015 · 10 citations
T2API: synthesizing API code usage templates from English texts with statistical translation· 2016 · 57 citations
Structure and Performance of a Dependency Language Model· 1997 · 71 citations
Long Short-Term Memory· 1997 · 97,540 citations
Exploiting syntactic structure for language modeling· 1998 · 29 citations

Authors

Vincent J. Hellendoorn

Google (United States)

Prémkumar Dévanbu

University of California, Davis

Actions

Institutions

University of California, Davis

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Are deep neural networks the best choice for modeling source code?

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider