The job shop scheduling problem (JSSP) is a paradigmatic and strongly NP-hard combinatorial optimisation problem that underpins production planning in modern manufacturing systems, and constraint programming (CP) has become one of the leading methodologies for tackling it. However, comparative studies of CP solvers for the JSSP have so far been restricted to a single benchmark family, a single instance-size range, or a single hardware setting, which limits the practical guidance they offer to both researchers and practitioners. This paper presents a controlled empirical evaluation of four state-of-the-art CP solvers—IBM ILOG CP Optimizer, Google OR-Tools (CP-SAT), Hexaly, and OptalCP—on the makespan-minimisation JSSP. The four engines are run with default parameters and a uniform 600 s wall-clock time budget on 332 instances drawn from nine canonical benchmark families (Fisher–Thompson, Lawrence, Adams–Balas–Zawack, Applegate–Cook, Yamada–Nakano, Storer–Wu–Vaccari, Taillard, Demirkol–Mehta–Uzsoy, and Da Col–Teppan), spanning sizes from 6×6 to 1000×1000 operations. OptalCP emerges as the most robust engine overall, certifying optimality on 191 of the 332 instances (57.5%) with the smallest average optimality gap (3.55%), followed by CP Optimizer (166 optima), OR-Tools (144), and Hexaly (116), while Hexaly dominates on industrial-scale problems and produces the bulk of the 22 new best-known upper bounds and one new best-known lower bound reported here. A Friedman test followed by Nemenyi post hoc comparisons confirms that OptalCP attains significantly smaller optimality gaps than the three other engines (p<0.001). Solver competitiveness depends sharply on instance size and the n/m ratio, with square instances confirmed as the hardest case. In practical terms, these findings support an instance-aware approach to CP solver selection: OptalCP is the default choice for small to large instances of moderate aspect ratio, whereas Hexaly is preferable for industrial-scale problems with tens of thousands of operations or extreme n/m ratios, where it is the only engine that reliably returns high-quality feasible schedules within the time budget.
Yuraszeck et al. (Wed,) studied this question.