What question did this study set out to answer?

The aim is to enhance Open Information Extraction in Portuguese by integrating Knowledge Graphs and Graph Machine Learning with Large Language Model augmentation.

February 26, 2026Open Access

Deepening graph-based approaches for Portuguese open information extraction with LLM augmentation

Key Points

The aim is to enhance Open Information Extraction in Portuguese by integrating Knowledge Graphs and Graph Machine Learning with Large Language Model augmentation.
Developed a framework combining Knowledge Graphs with Graph Machine Learning.
Constructed Knowledge Graphs from textual data.
Performed Predicate and Subject/Object Extraction using GraphSAGE models.
Utilized Large Language Models for augmentation during extraction tasks.
Evaluated the system on a Portuguese dataset through automated and human assessments.
Achieved an F1-score of 64.9% for Predicate extraction and 89.7% for Subject/Object extraction.
The overall system attained an F1-score of 58.2%.
Human evaluation on 51 sentences showed substantial agreement with Cohen’s Kappa of 0.67.
Extracted an average of 1.84 triples per sentence, with a correctness rate of 53.9%.
Reduced invalid/wrong extractions from 31.7% to 6.6%, indicating improved Precision.

Abstract

Utilizing richer information, such as structural and syntactic details, can enhance Natural Language Processing (NLP) tasks like Open Information Extraction (Open IE), particularly for languages with limited resources like Portuguese. Knowledge Graphs (KGs) offer a robust solution by unifying diverse annotations and enabling the application of Graph Machine Learning (Graph ML). This paper presents an advanced framework for Portuguese Open IE, integrating KGs and Graph ML with Large Language Model (LLM) augmentation. Our framework employs a three-stage process: (1) initial Knowledge Graph (KG) construction from text, followed by (2) Predicate Extraction and (3) Subject/Object Extraction, both leveraging GraphSAGE models. Large Language Models (LLMs) (DeepSeek) are used for augmentation when Graph ML predictions are absent or for refining/validating extractions. We present two versions of a system that was evaluated on a Portuguese dataset. Automatic evaluation (word-based) for the best version of the system yielded an F1-score of 64.9% for Predicate extraction and 89.7% for Subject/Object extraction. The final end-to-end performance of the system is an F1-score of 58.2%. A human evaluation was conducted on 51 Portuguese sentences (yielding 100 triples) by two annotators, achieving a substantial agreement (Cohen’s Kappa of 0.67). The system extracted an average of 1.84 triples per sentence, with 53.9% deemed correct. Notably, this version significantly reduced invalid/wrong extractions to 6.6% from 31.7% in the previous version, demonstrating improved Precision while maintaining the ability to extract multiple meaningful triples.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper