Key points are not available for this paper at this time.
Deep Neural Networks have become the tool of choice for Machine Learning practitioners today. They have been successfully applied for solving a large class of learning problems both in the industry and academia with applications in fields such as Computer Vision, Natural Language Processing, Big data Analytics and Bioinformatics. One important aspect of designing a neural network is the choice of the activation function to be used at the neurons of the different layers. Activation functions are used for introducing non-linearity into the neural network model so that the network can progressively learn more effective feature representations. Several different activation functions have been used in the literature. However Linear, Sigmoid, Tanh and ReLU are the most commonly used activation functions and they are often selected empirically during the network design phase, rather than through a proper data driven process. In this work we empirically study the problem of generalizing the single output ReLU activation by parameterizing the same so that data driven methods can be used to select variations of the single output ReLU. We call this class of activations the Generalized ReLU Activations. Special cases are ReLU as well as variations like the Leaky ReLU that have already been studied in the literature. We report results of extensive experiments on the well known MNIST handwriting dataset.
Banerjee et al. (Thu,) studied this question.