Scientific publishing remains document-centric: much knowledge is embedded in natural language across PDFs, posts, and repositories, which limits machine-assisted discovery and reuse. In generative AI, frequent releases of Large Language Models (LLMs) scatter core facts architecture, training, parameters, licensing, and applications across heterogeneous sources, so maintaining a stable, queryable model catalog is difficult. Knowledge Graph-based infrastructures such as the Open Research Knowledge Graph (ORKG) address this gap through structured, machine-actionable descriptions aligned with FAIR principles. This thesis presents an NLP workflow that parses research papers (e.g. arXiv), applies LLM-based extraction under the ORKG LLM template, and maps outputs into the “Generative AI Model Landscape” comparison, including support for multivariant papers. In total, 18 LLMsareevaluated: 10 models following the supervisor’s taxonomy one vision-language model, three thinking or reasoning-focused models, and six instruction-tuned models spanning small to very large scales plus eight additional compact open models (1B–8B parameters) run locally for comparison. Evaluation uses property-level precision, recall, and F1 with strict and fuzzy matching, complemented by BERTScore on longer fields. The results characterize extraction quality across model types and highlight recurring failure modes for numerically dense and context-dependent properties. The work yields a reproducible end-to-end pipeline and supports curating machine-actionable LLM knowledge in the ORKG.
Alaa Kefi (Wed,) studied this question.