Summary Generative artificial intelligence (GenAI) has advanced rapidly across modalities, from text-to-text large language models to text-to-image and text-to-video diffusion models. Here, we investigate text-to-model generation: whether GenAI can map semantic task descriptions to functional neural network parameters for personalized classification. We present Tina, a text-conditioned neural network diffusion model that leverages a diffusion transformer conditioned on contrastive language-image pre-training (CLIP)-embedded task descriptions. Tina generates high-quality personalized classifiers across domains, including natural and medical images, from text prompts at inference time. We demonstrate that Tina achieves both in-distribution and out-of-distribution personalization, supports zero-shot/few-shot image prompts, generalizes to unseen classes, and scales to more complex tasks. Tina establishes text-to-model GenAI as a promising paradigm for on-demand personalization and offers a new channel for human-AI interaction through natural-language instructions.
Li et al. (Fri,) studied this question.