What question did this study set out to answer?

April 10, 2026Open Access

Evolutionary Multi-Objective Prompt Learning for Synthetic Text Data Generation with Black-Box Large Language Models

Key Points

To automate prompt learning for generating high-quality synthetic text datasets using black-box large language models.
Introduced EVOLMD-MO, an evolutionary framework for prompt optimization.
Formulated prompt optimization as a multi-objective search problem.
Implemented genetic operators to evolve candidate prompts with two objectives: semantic fidelity and generative diversity.
Integrated a modular multi-agent architecture for decoupling prompt evolution and evaluation mechanisms.
Used the NSGA-II algorithm to discover Pareto-optimal prompts.
Consistently improved prompt quality across generations while balancing fidelity and diversity.
Compared to single-objective methods, explored a broader semantic search space.
Produced more diverse yet semantically coherent synthetic datasets.

Abstract

High-quality training data are essential for the performance and generalization of artificial intelligence systems, particularly in dynamic environments such as adaptive stream processing for disaster response. However, constructing large and representative datasets remains costly and time-consuming, especially in domains where real data are scarce or difficult to obtain. Large Language Models (LLMs) provide powerful capabilities for synthetic text generation, yet the quality of generated data strongly depends on the design of input prompts. Prompt engineering is therefore critical, but it remains largely manual and difficult to scale, particularly in black-box settings where model internals are inaccessible. This work introduces EVOLMD-MO, a multi-objective evolutionary framework for automated prompt learning aimed at generating high-quality synthetic text datasets using black-box LLMs. The proposed approach formulates prompt optimization as a multi-objective search problem in which candidate prompts evolve through genetic operators guided by two complementary objectives: semantic fidelity to reference data and generative diversity of the produced samples. To support scalable optimization, the framework integrates a modular multi-agent architecture that decouples prompt evolution, LLM interaction, and evaluation mechanisms. The evolutionary process is implemented using the NSGA-II algorithm, enabling the discovery of diverse Pareto-optimal prompts that balance semantic preservation and diversity. Experimental evaluation using large-scale disaster-related social media data demonstrates that the proposed approach consistently improves prompt quality across generations while maintaining a stable trade-off between fidelity and diversity. Compared with a single-objective baseline, EVOLMD-MO explores a significantly broader semantic search space and produces more diverse yet semantically coherent synthetic datasets. These results indicate that multi-objective evolutionary prompt learning constitutes a promising strategy for black-box LLM-driven data generation, with potential applicability to adaptive data analytics and real-time decision-support systems in highly dynamic environments, pending broader validation across domains and models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Pastrián et al. (Wed,) studied this question.

synapsesocial.com/papers/69d895d86c1944d70ce06f39 — DOI: https://doi.org/10.3390/app16083623

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Prompt’s Evolution for Language Model-Driven Data Generation· 2025 · 1 citations
The use of MMR, diversity-based reranking for reordering documents and producing summaries· 1998 · 2,135 citations
Self-adaptive processing graph with operator fission for elastic stream processing· 2016 · 47 citations
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks· 2019 · 10,749 citations
Multi-Criteria Decision Making (MCDM) Methods and Concepts· 2023 · 693 citations

Authors

Diego Pastrián

Diego Portales University

Nicolás Hidalgo

Diego Portales University

Víctor Reyes

Diego Portales University

Journals

Applied Sciences

Actions

Institutions

Universitat de València

Universitat Politècnica de València

Diego Portales University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Evolutionary Multi-Objective Prompt Learning for Synthetic Text Data Generation with Black-Box Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider