January 1, 2021Open Access

When does Further Pre-training MLM Help? An Empirical Study on Task-Oriented Dialog Pre-training

Key Points

Key points are not available for this paper at this time.

Abstract

Further pre-training language models on indomain data (domain-adaptive pre-training, DAPT) or task-relevant data (task-adaptive pretraining, TAPT) before fine-tuning has been shown to improve downstream tasks' performances. However, in task-oriented dialog modeling, we observe that further pre-training MLM does not always boost the performance on a downstream task. We find that DAPT is beneficial in the low-resource setting, but as the fine-tuning data size grows, DAPT becomes less beneficial or even useless, and scaling the size of DAPT data does not help. Through Representational Similarity Analysis, we conclude that more data for fine-tuning yields greater change of the model's representations and thus reduces the influence of initialization.

Bookmark

View Full Paper

Cite This Study

Zhu et al. (Fri,) studied this question.

synapsesocial.com/papers/69d76994b1cb92dd1bb8aff2 https://doi.org/https://doi.org/10.18653/v1/2021.insights-1.9

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper