Key points are not available for this paper at this time.
Neural language models (NLMs) have frequently been tested on their ability to model grammatical phenomena in a "human-like'' way, but their behavior is often not compared to actual human data, making it unclear whether their behavior is actually human-like. An example of this can be seen with long distance filler-gap dependencies and their island constraints: Wilcox et al. (2018) devised a method to directly test whether NLMs can model these phenomena in a human-like way, and this method is accompanied by specific predictions about how models should perform. These predictions are, however, not all supported by experimental research with human participants, making it difficult to conclude that the NLMs can model these phenomena comparably to humans. Consequently, in the current study, we investigated whether the predictions made by Wilcox et al. (2018) are supported by human data by testing both a Long Short-Term Memory language model and human participants in English with the method devised by Wilcox et al. (2018), and comparing their results. The results showed that the model might exhibit "human-like'' behavior according to the predictions of Wilcox et al. (2018), but this behavior was not fully comparable to that of humans. Therefore, this study not only shows that it is important to obtain human data to validate the predictions of this specific method, but also that predictions of "human-like'' behavior for NLMs on other grammatical phenomena should always be grounded in data from human experiments.
Suijkerbuijk et al. (Wed,) studied this question.