Joint entity and relation extraction represents a critical task in knowledge representation, but often suffers from the bottleneck of requiring large amounts of labeled data, which is expensive and laborious to obtain. While semi-supervised learning (SSL) offers a way to leverage unlabeled data, traditional methods face limitations in generating high-quality, diverse augmentations for text. This paper introduces a novel framework that synergistically combines SSL with large language models (LLMs) to improve joint entity and relation extraction, especially in low-resource settings. Our approach utilizes LLMs to generate semantically coherent and diverse augmented data from unlabeled samples. These augmented samples, along with limited labeled data, are used within an SSL framework employing consistency regularization and pseudo-labeling to train the extraction model. Crucially, the framework incorporates an iterative refinement mechanism where the performance of the SSL component informs the parameter-efficient fine-tuning of the LLM, leading to progressively better data augmentation and model accuracy. We demonstrate through extensive experiments on four benchmark datasets that our proposed method significantly outperforms existing state-of-the-art approaches, particularly when labeled data is scarce. The framework’s design is adaptable and can be integrated with various existing joint extraction models, showcasing its generalizability and practical utility.
Liu et al. (Sat,) studied this question.