The optimizer is a critical element of neural networks because it computes their optimal parameters through a training process. The Adam optimizer is considered the state of the art in deep learning. However, a drawback is the cost of storing and computing their gradients. A useful tool for addressing this issue is the application of the wavelet transform, and other relevant tool is the fractional derivative, which can be used to create fractional gradient optimizers. This research combines the wavelet transform and fractional optimizers to propose FAdamWav, a fractional version of Adam that uses (i) a parametric discrete wavelet transform to theoretically save 50%, 75% or 87.5% of gradient’s memory with one, two or three transformation levels, and (ii) a fractional gradient to optimize the neural network parameters. Experiments indicate that the saved memory is lower than the theoretical bounds, but memory is saved and fractional wavelet-based optimizers have competitive performance compared to their non-fractional and non-wavelet counterparts.
Herrera-Alcántara et al. (Thu,) studied this question.