What question did this study set out to answer?

This research aims to enhance voice activity detection and speech quality in cellular networks using a novel deep learning approach.

March 15, 2026Open Access

Voice Activity Detection in Cellular Networks: An Ekolama Loss–Optimised Deep Q-Network Approach for Adaptive Bandwidth Allocation and Speech Quality Enhancement

Key Points

This research aims to enhance voice activity detection and speech quality in cellular networks using a novel deep learning approach.
Developed Ekolama Loss–Optimised Deep Q-Network for VAD.
Evaluated through simulations across GSM, LTE, VoLTE, and VoNR networks.
Integrated adaptive mechanisms to handle network impairments such as latency, jitter, and packet loss.
Achieved 67% reduction in mean latency from 185 to 60 ms.
Demonstrated 27% decrease in jitter from 49 to 36 ms.
Improved packet loss from 3.9% to 2.8% and increased detection accuracy to 94.8%.

Abstract

Voice Activity Detection (VAD) plays a critical role in enhancing cellular network efficiency by suppressing silent intervals and conserving bandwidth. However, traditional and conventional deep learning-based VAD methods often fail to adapt to dynamic network conditions, resulting in suboptimal performance under variable noise, jitter, and packet loss. This study proposes the Ekolama Loss–Optimised Deep Q-Network (ELO-DQN), a reinforcement learning model designed to improve detection accuracy, bandwidth usage, and speech quality across heterogeneous networks from GSM to 5G. The ELO-DQN integrates a novel composite loss function by leveraging the potentials of mean squared error, mean absolute error, and Huber loss combined with an adaptive exponential weighting mechanism, to enhance training stability and robustness under non-stationary conditions. The proposed model was evaluated through extensive simulations across GSM, LTE, VoLTE, and VoNR network environments, integrating realistic network impairments such as latency (45–320 ms), jitter (2–95 ms), and packet loss (up to 5%). Comparative analysis demonstrated that ELO-DQN significantly outperformed conventional Deep leaning and baseline VAD approaches. It achieved a 67% reduction in mean latency (from 185 ± 62 ms to 60 ± 58 ms), a 27% decrease in jitter (from 49 ± 18 ms to 36 ± 15 ms), and a 28% improvement in packet loss reduction (from 3.9% to 2.8%), with corresponding gains in the 95th percentile values. Furthermore, ELO-DQN improved detection accuracy up to (94.8% as against 87.2% baseline), precision of (94.1% as against 85.5% bseline), recall of (95.3% as against 84.7% baseline), and F1-score of (94.7% as against 85.1% baseline) traditional VAD, and Deep learning VAD approaches. Statistical validation via paired t-tests and Wilcoxon signed-rank tests confirmed the significance of these improvements (p < 0.05). The model also enhanced bandwidth efficiency and maintained higher Mean Opinion Score (MOS) and Perceptual Evaluation of Speech Quality (PESQ) metrics, demonstrating superior resilience in low-SNR and high-traffic scenarios. These findings establish ELO-DQN as a robust, adaptive solution for voice activity detection in modern mobile networks, offering substantial benefits in spectral efficiency, user experience, and network scalability.

Voice Activity Detection in Cellular Networks: An Ekolama Loss–Optimised Deep Q-Network Approach for Adaptive Bandwidth Allocation and Speech Quality Enhancement

Key Points

Abstract

Cite This Study