What question did this study set out to answer?

The study aims to enhance text classification performance by addressing the challenge of acquiring labeled datasets in specialized domains.

April 1, 2026Open Access

SCUNLP-2 at the NTCIR-18 FigArg-2 Task: Apply Repeat-Error-Correction Learning on Text Classification

Key Points

The study aims to enhance text classification performance by addressing the challenge of acquiring labeled datasets in specialized domains.
Trained a base BERT model using available text-label pairs.
Collected misclassified samples from model predictions.
Utilized an LLM to rewrite erroneous texts while retaining original labels.
Reintroduced rewritten texts into the training set for fine-tuning.
Achieved the highest validation set Micro-F1 score of 77.33% after fine-tuning with rewritten texts.
Demonstrated that data augmentation through error correction can improve classification robustness.

Abstract

Large Language Models (LLMs) have shown promising capabilities for zero-shot text classification, yet they often do not outperform fine-tuned traditional models like BERT when trained on sufficient labeled data. However, acquiring large-scale human-labeled datasets can be challenging, particularly in specialized domains. To address this gap, we propose Repeat-Error-Correction Learning, a framework that iteratively identifies and rewrites misclassified samples to augment the training set. First, we train a base BERT model using available text–label pairs. Next, the trained model infers labels on the same dataset, and we collect the misclassified samples. An LLM, such as GPT-4o-mini, then rewrites these erroneous texts while preserving their original labels. The rewritten texts are reintroduced into the training set, and the model is fine-tuned on this expanded corpus. By iteratively refining the training data through error correction and text rewriting, the proposed method aims to achieve robust classification performance despite limited initial annotations. Our results indicate that fine-tuning the base model by adding rewritten misclassified text achieved the highest validation set Micro-F1 score (77.33%). These findings contribute to a deeper understanding of a cost-friendly and efficient way to generate data for augmenting text classification models.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper