Utilizing richer information, such as structural and syntactic details, can enhance Natural Language Processing (NLP) tasks like Open Information Extraction (Open IE), particularly for languages with limited resources like Portuguese. Knowledge Graphs (KGs) offer a robust solution by unifying diverse annotations and enabling the application of Graph Machine Learning (Graph ML). This paper presents an advanced framework for Portuguese Open IE, integrating KGs and Graph ML with Large Language Model (LLM) augmentation. Our framework employs a three-stage process: (1) initial Knowledge Graph (KG) construction from text, followed by (2) Predicate Extraction and (3) Subject/Object Extraction, both leveraging GraphSAGE models. Large Language Models (LLMs) (DeepSeek) are used for augmentation when Graph ML predictions are absent or for refining/validating extractions. We present two versions of a system that was evaluated on a Portuguese dataset. Automatic evaluation (word-based) for the best version of the system yielded an F1-score of 64.9% for Predicate extraction and 89.7% for Subject/Object extraction. The final end-to-end performance of the system is an F1-score of 58.2%. A human evaluation was conducted on 51 Portuguese sentences (yielding 100 triples) by two annotators, achieving a substantial agreement (Cohen’s Kappa of 0.67). The system extracted an average of 1.84 triples per sentence, with 53.9% deemed correct. Notably, this version significantly reduced invalid/wrong extractions to 6.6% from 31.7% in the previous version, demonstrating improved Precision while maintaining the ability to extract multiple meaningful triples.
Silva et al. (Mon,) studied this question.