Voice Activity Detection (VAD) plays a critical role in enhancing cellular network efficiency by suppressing silent intervals and conserving bandwidth. However, traditional and conventional deep learning-based VAD methods often fail to adapt to dynamic network conditions, resulting in suboptimal performance under variable noise, jitter, and packet loss. This study proposes the Ekolama Loss–Optimised Deep Q-Network (ELO-DQN), a reinforcement learning model designed to improve detection accuracy, bandwidth usage, and speech quality across heterogeneous networks from GSM to 5G. The ELO-DQN integrates a novel composite loss function by leveraging the potentials of mean squared error, mean absolute error, and Huber loss combined with an adaptive exponential weighting mechanism, to enhance training stability and robustness under non-stationary conditions. The proposed model was evaluated through extensive simulations across GSM, LTE, VoLTE, and VoNR network environments, integrating realistic network impairments such as latency (45–320 ms), jitter (2–95 ms), and packet loss (up to 5%). Comparative analysis demonstrated that ELO-DQN significantly outperformed conventional Deep leaning and baseline VAD approaches. It achieved a 67% reduction in mean latency (from 185 ± 62 ms to 60 ± 58 ms), a 27% decrease in jitter (from 49 ± 18 ms to 36 ± 15 ms), and a 28% improvement in packet loss reduction (from 3.9% to 2.8%), with corresponding gains in the 95th percentile values. Furthermore, ELO-DQN improved detection accuracy up to (94.8% as against 87.2% baseline), precision of (94.1% as against 85.5% bseline), recall of (95.3% as against 84.7% baseline), and F1-score of (94.7% as against 85.1% baseline) traditional VAD, and Deep learning VAD approaches. Statistical validation via paired t-tests and Wilcoxon signed-rank tests confirmed the significance of these improvements (p < 0.05). The model also enhanced bandwidth efficiency and maintained higher Mean Opinion Score (MOS) and Perceptual Evaluation of Speech Quality (PESQ) metrics, demonstrating superior resilience in low-SNR and high-traffic scenarios. These findings establish ELO-DQN as a robust, adaptive solution for voice activity detection in modern mobile networks, offering substantial benefits in spectral efficiency, user experience, and network scalability.
Ekolama et al. (Fri,) studied this question.