We present two novel, fast gradient based optimizer algorithms with dynamic learning rate. The main idea is to adapt the learning rate α by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and relies much less on hand-tuned hyper-parameters, providing greater universality. It scales linearly (of order O(n)) with dimension and is rotation invariant, thereby overcoming known limitations. The method is presented in two variants C2Min and P2Min, with slightly different control. Their impressive performance is demonstrated by experiments on several benchmark data-sets (ranging from MNIST to Tiny ImageNet) against the state-of-the-art optimizers Adam and Lion.
Kleinsorge et al. (Fri,) studied this question.