January 1, 2020Open Access

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

This paper presents MixText, a semisupervised learning method for text classification, which uses our newly designed data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. Moreover, we leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data, hence making them as easy to use as labeled data. By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-ofthe-art semi-supervised learning methods on several text classification benchmarks. The improvement is especially prominent when supervision is extremely limited.

Me gusta

Guardar

Ver artículo completo