Key points are not available for this paper at this time.
Current approaches to testing and debugging NLP models rely on highly variable human creativity and extensive labor, or only work for a very restrictive class of bugs. We present AdaTest, a process which uses large scale language models (LMs) in partnership with human feedback to automatically write unit tests highlighting bugs in a target model. Such bugs are then addressed through an iterative text-fixretest loop, inspired by traditional software development. In experiments with expert and non-expert users and commercial / research models for 8 different tasks, AdaTest makes users 5-10x more effective at finding bugs than current approaches, and helps users effectively fix bugs without adding new bugs. * Equal contribution, author order chosen by casting lots.
Ribeiro et al. (Sat,) studied this question.
Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context: