The use of Deep Neural Networks (DNNs) for picture classification has been very successful in many different industries. Computational complexity, latency concerns, and the need for great efficiency make their use in real-time applications difficult. A study that compares optimization methods with the goal of making DNNs better at classifying images in real-time. We assess various approaches, such as weight pruning, quantization, low-rank factorization, and knowledge distillation, taking into consideration their effects on model precision, inference velocity, and computing demands. We use state-of-the-art DNN architectures like ResNet and MobileNet to gain experimental results from popular picture datasets like CIFAR-10 and ImageNet. Our research shows that although model efficiency and accuracy are not always compatible, that pruning and quantization are two optimization methods that can greatly reduce inference time while keeping classification accuracy relatively stable. When it comes to selecting the right optimization strategies for deploying DNNs in real-time, mission-critical applications like autonomous driving, video surveillance, and augmented reality systems, we also investigate hybrid approaches that combine various optimizations to further decrease latency and improve performance in environments with limited resources.
Meenu Vijarania (Sat,) studied this question.