Speaker gender recognition (SGR) identifies a speaker’s gender from voice characteristics and is used in speech synthesis, voice assistants and human–computer interaction. Traditional methods only rely on features like pitch, whereas recent approaches use deep learning for better accuracy. However, some challenges remain, such as robustness in noisy environments, handling ambiguous voices and achieving high accuracy across languages. Model bias and ethical concerns pose obstacles to real-world deployment. To address these drawbacks, this paper proposes a Speaker Gender Recognition using Optimized Multi-Component Attention Graph Convolutional Neural Network with EfficientNetB7 (SGR-MAGCNN-EffNetB7) technique. Here, data collected through the Mozilla Common Voice dataset are used. The collected data are fed into the feature extraction stage with the help of Multi-Component Attention Graph Convolutional Neural Network (MAGCNN). The extracted features are given to the EfficientNetB7 for identifying the speaker gender as male and female. EfficientNetB7 is integrated by replacing the convolutional layer of MAGCNN, while retaining its dense layers for classification. Finally, the shrike optimization algorithm (SHOA) is proposed for optimizing the weight parameters of MAGCNN-EffNetB7. The simulation outcomes demonstrate that the proposed SGR-MAGCNN-EffNetB7 approach achieves better accuracy and better precision when compared to the existing methods.
Gundal et al. (Mon,) studied this question.