Purpose This study aims to enhance supply chain risk identification by introducing an AI-driven framework “SCREWS” leveraging Large Language Models (LLMs). It explores the optimization of operational parameters, particularly Temperature and Top P, to improve classification accuracy. Design/methodology/approach A Design of Experiments approach is used to systematically evaluate the influence of operational LLM parameters on binary classification performance. Experiments with varied configurations were conducted using data from a randomized set of news articles to identify the optimal parameter settings. Findings Temperature significantly impacts classification precision, with optimal values identified in the range of 0.4–0.7. Conversely, the Top P parameter showed limited influence. The study establishes a robust methodology for balancing randomness and determinism in LLM outputs to achieve reliable classifications. Research limitations/implications The study focuses on binary classification and a fixed model, limiting generalizability to multi-class scenarios. The fixed LLM architecture used restricts insights into the effects of model variability. Future research should explore broader outputs than binary classification and diverse model architectures. Practical implications This research provides actionable insights for deploying and improving LLMs in dynamic supply chain environments. The findings emphasize precise parameter tuning as critical for effective risk identification, enabling practitioners to improve operational resilience and decision-making accuracy. Social implications Calibrating LLM parameters strengthens early warning in supply chains. Better tuned classification reduces false alarms and missed incidents, enabling faster, targeted interventions. For critical goods such as food and medicines, this means steadier availability, fewer stockouts, and more efficient allocation of scarce resources. Originality/value This study addresses a key gap in AI-driven supply chain risk identification research by systematically optimizing operational LLM parameters. It offers a scalable, replicable methodology applicable to various domains requiring high-stakes decision-making.
Kühl et al. (Thu,) studied this question.