Reinforcement learning (RL) demonstrated immense success in modeling complex physics-driven systems, providing end-to-end trainable solutions by interacting with a simulated or real environ- ment, maximizing a scalar reward signal. In this work, we propose, building upon previous work, an end-to-end multi-agent RL approach with assignment constraints for reconstructing particle tracks in pixelated particle detectors. Our approach optimizes collaboratively a parameterized policy, functioning as a heuristic to a multidimensional assignment problem, by jointly minimiz- ing the total amount of particle scattering over the reconstructed tracks in a readout frame. To sat- isfy constraints, guaranteeing a unique assignment of particle hits, we propose a safety layer solv- ing a linear assignment problem for every joint action. Further, to enforce cost margins, increas- ing the distance of the local policies predictions to the decision boundaries of the optimizer map- pings, we recommend the use of an additional component in the blackbox gradient estimation, forcing the policy to solutions with lower total assignment costs. We empirically show on simu- lated data, generated for a particle detector developed for proton imaging, the effectiveness of our approach, compared to multiple single- and multi-agent baselines. We further demonstrate the effectiveness of constraints with cost margins for both optimization and generalization, introduced by wider regions with high reconstruction performance as well as reduced predictive instabilities. Our results form the basis for further developments in RL-based tracking, offering both enhanced performance with constrained policies and greater flexibility in optimizing tracking algorithms through the option for individual and team rewards.
Collaboration et al. (Thu,) studied this question.