Key points are not available for this paper at this time.
How do neural language models keep track of number agreement between subject verb? We show that `diagnostic classifiers', trained to predict number from internal states of a language model, provide a detailed understanding of, when, and where this information is represented. Moreover, they give us into when and where number information is corrupted in cases where the model ends up making agreement errors. To demonstrate the causal role by the representations we find, we then use agreement information to the course of the LSTM during the processing of difficult sentences. from such an intervention reveal a large increase in the language's accuracy. Together, these results show that diagnostic classifiers give an unrivalled detailed look into the representation of linguistic in neural models, and demonstrate that this knowledge can be used improve their performance.
Giulianelli et al. (Mon,) studied this question.