What question did this study set out to answer?

The aim is to utilize LLMs to transform unstructured instructions into structured JSON, enhancing metadata quality.

March 25, 2026Open Access

Extracting infromation from text using an LLM model – first tests

Q: What does this research mean for the field?

A two-stage processing pipeline utilizing smaller, instruction-obedient local large language models like Phi-3 can reliably and deterministically convert unstructured natural-language instructions into strict, hallucination-free JSON metadata for scientific workflows. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

Puntos clave

The aim is to utilize LLMs to transform unstructured instructions into structured JSON, enhancing metadata quality.
Combined locally executed LLM with a .NET Web API for prompting and validation.
Implemented a two-stage processing pipeline: linguistic normalization and schema-guided extraction.
Used iterative prompt engineering for deterministic output.
Smaller models like Phi-3 showed reliable performance under constraints.
Achieved schema-compliant output without free text or hallucinated fields.
Workflow design facilitates offline use, ensuring reproducibility.

Resumen

This work investigates the use of Large Language Models (LLMs) to convert unstructured natural‑language instructions into structured JSON metadata suitable for scientific workflows. The system architecture combines a locally executed LLM via Ollama, a .NET Web API responsible for prompting and validation, and a lightweight console client. The processing pipeline operates in two stages: a linguistic normalization step that translates operator input into clear, unambiguous English, followed by schema‑guided extraction that enforces strict JSON structure. Through iterative prompt engineering, the approach achieves deterministic, schema‑compliant output while avoiding free text and hallucinated fields. Experiments show that smaller, instruction‑obedient models such as Phi‑3 provide the most reliable behavior under strong constraints. The resulting workflow is robust, offline‑capable, and well‑suited to institutional environments where reproducibility and metadata quality are essential. Future extensions may include schema expansion, ontology integration, and confidence scoring.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Francesco Carraro (Mon,) studied this question.

synapsesocial.com/papers/69c37b62b34aaaeb1a67db09 https://doi.org/https://doi.org/10.5281/zenodo.19184107

Me gusta

Guardar

Ver artículo completo