Los puntos clave no están disponibles para este artículo en este momento.
Pretraining Neural Language Models (NLMs) over a large corpus involves the text into training examples, which are contiguous text segments of processable by the neural architecture. We highlight a bias introduced by common practice: we prove that the pretrained NLM can model much stronger between text segments that appeared in the same training example, it can between text segments that appeared in different training examples. intuitive result has a twofold role. First, it formalizes the motivation a broad line of recent successful NLM training heuristics, proposed for pretraining and fine-tuning stages, which do not necessarily appear related first glance. Second, our result clearly indicates further improvements to made in NLM pretraining for the benefit of Natural Language Understanding. As an example, we propose "kNN-Pretraining": we show that including related non-neighboring sentences in the same pretraining example improved sentence representations and open domain question answering. This theoretically motivated degree of freedom for pretraining design indicates new training schemes for self-improving.
Levine et al. (Sat,) studied this question.