This paper introduces MLAgents, a multi-agent system for end-to-end ML model synthesis, emphasizing data privacy and adaptive optimization. In its closed-loop control architecture, LLM agents generate pipelines for secure local execution, protecting sensitive data. The core contribution is a non-LLM Improvement Agent that treats pipeline modification as a multi-armed bandit problem. This agent uses Thompson sampling, a Bayesian adaptive policy, to balance exploration-exploitation based on empirical feedback. We demonstrate that this principled approach achieves competitive performance while adhering to strict privacy-by-design principles.
Kuźniar et al. (Wed,) studied this question.