ABSTRACT The rapid growth of artificial intelligence (AI) has accelerated its integration into embedded systems, enabling real‐time intelligence at the edge with reduced latency, improved privacy, and lower power consumption. This paradigm, known as Embedded AI, deploys machine learning and deep learning models directly on resource‐constrained platforms such as microcontrollers, FPGAs, and system‐on‐chip devices. This study presents a structured optimization and benchmarking framework for lightweight neural networks targeting heterogeneous embedded hardware platforms, including CPU, GPU, TPU, and MCU architectures. Model compression techniques such as quantization, pruning, and mixed‐precision computation are applied to reduce memory footprint and computational complexity while preserving classification accuracy. Performance evaluation is conducted using inference latency, model size, and power consumption as benchmarking metrics under consistent experimental conditions. Results demonstrate that optimized lightweight models significantly improve computational efficiency and energy performance across edge deployment platforms. The proposed framework provides practical guidance for selecting suitable neural network configurations for real‐time embedded AI applications.
Fridous et al. (Fri,) studied this question.