The efficiency of the proposed automatic speaker recognizer is evaluated using two speech databases. The feature vector consists of 21 mel-frequency cepstral coefficients (MFCCs), along with up to three additional features derived from the amplitude spectrum. The additional features are calculated based on the logarithm of the energy around the appropriate local maximum in the spectrum, the frequency of that maximum, and the logarithm of the energy of the maximum component in the spectrum across all frames of the observed signal. The speaker identification procedure for a closed set of speakers is tested on the Solo section of the CHAINS database and a speech database with expressed emotions, developed within the S-ADAPT project. The achieved maximum mean recognition accuracies are 97.11%, on the CHAINS database, using a feature vector of 21 MFCCs and two additional features, and 98.65% on neutral speech, as well as 98.72% on the entire database, for the S-ADAPT database, using a feature vector of 21 MFCCs.
Jokić et al. (Wed,) studied this question.