The growing integration of generative artificial intelligence in logistics demands efficient simulation modeling. This study evaluates generative large language models, Perplexity and ChatGPT, for discrete-event simulation in ExtendSim. It focuses on modeling a real, complex manufacturing system, yielding 9721 tons of output. The following three scenarios were assessed: autonomous model creation, output estimation from process descriptions and parameters, and copilot-guided manual building. LLMs cannot autonomously construct ExtendSim models due to the lack of APIs. Output estimation only matched benchmarks after iterative prompt refinement, achieving errors of 0.1% for Perplexity and 1.2% to 22.8% for ChatGPT. Estimation without substantial human intervention proved infeasible. Only the copilot approach appeared viable despite initial errors. It enabled a validated model with 9718 tons output after resolving 25 errors for Perplexity and 22 for ChatGPT through iterative refinement. Approximately 28% (Perplexity) or 32% (ChatGPT) of the errors were hallucinations. The copilot approach reduced development time from several days to 8–10 h. Human expertise remained essential for verifying model outputs and addressing hallucinations and logical flaws. Consequently, this approach may be less feasible for inexperienced users. The copilot paradigm offers practical acceleration for experienced users; however, its limitations underscore the need for API integration and retrieval-augmented generation enhancements.
Straka et al. (Sun,) studied this question.