Multi-agent large language models (LLMs) were evaluated for Process Systems Engineering (PSE) tasks. Two multi-agent systems (Dyad 2.1 and Claude Opus 4.6) were compared with two single-agent baselines (ChatGPT 5.2 and Google Gemini Pro 3) in three crystallization-centered case studies spanning soft sensing, mechanistic modeling, and nonlinear model predictive control (NMPC). In Case Study 1, ATR–FTIR calibration models were built to predict paracetamol mole fraction from spectra, temperature, and solvent composition. All LLMs converged to latent-variable linear chemometric models and achieved near-linear parity with R 2 close to unity across training, validation, and test sets. In Case Study 2, all systems reconstructed a moment-based population balance model (PBM) coupled to a solute mass balance for potassium sulfate dissolution and seeded crystallization, but robustness differed. For dissolution, multi-agent workflows maintained stronger validation performance, while the single-agent baseline showed the largest degradation, including strongly reduced R 2 for the moments of crystal size distribution (CSD). For crystallization, only the multi-agent PBMs reproduced the dynamics consistently and were retained for model-updating tests under dataset shift. In Case Study 3, LLMs proposed NMPC formulations for batch crystallization of potassium sulfate, which temperature was manipulated to regulate mean size L ̄ 10 and crystal mass m under constraints. All NMPCs were implementable at a 1 min sampling time, achieved near set-point tracking with moderate control effort over five scenarios, and Dyad provided the best overall closed-loop performance. • Specialized multi-agent LLMs were evaluated on chemical engineering tasks. • LLMs were applied to develop solutions for sensing, modeling, and nonlinear control. • ATR-FTIR calibration models achieve R 2 > 0 . 97 with automated feature selection. • Population balance models are iteratively refined and recover correct equilibrium behavior. • NMPC formulations reach set-points with computation times below 20 s per control move.
Lima et al. (Sun,) studied this question.