Key points are not available for this paper at this time.
Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating LLMs on code generation benchmarks such as HumanEval. Although being very helpful for comparing different LLMs, existing evaluation focuses on a simple code generation scenario (i.e., function-level or statement-level code generation), which mainly asks LLMs to generate one single code unit (e.g., a function or a statement) for the given natural language description. Such evaluation focuses on generating independent and often small-scale code units, thus leaving it unclear how LLMs perform in real-world software development scenarios.
Building similarity graph...
Analyzing shared references across papers
Loading...
Du et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68e6f5edb6db64358766fe93 — DOI: https://doi.org/10.1145/3597503.3639219
Xueying Du
Mingwei Liu
Kaixin Wang
Fudan University
Building similarity graph...
Analyzing shared references across papers
Loading...