Emotional information in speech is conveyed through prosodic cues, including variations in fundamental frequency (F0 contour), intensity, and duration. Emotion perception has been largely studied behaviorally but the neural mechanisms underlying it are not well understood. The Speech-evoked Frequency Following Response (FFR) is a non-invasive neural measure reflecting the encoding of speech acoustics in the auditory system. This study investigated the extent to which the FFR can represent prosody-related F0 contours and compared neural responses between male and female listeners. Sixteen normal-hearing adults underwent FFR recording in response to the word "balloon" spoken with sad and happy emotion by a male and a female talker. Using a pitch-tracking algorithm, F0 tracking accuracy was quantified via Root Mean Squared Error (RMSE) and an accuracy percentage. The results showed that the FFR can track emotional F0 contours; however, the degree of accuracy is modulated by the acoustic characteristics of emotions and talker's voice. Sad emotional speech and the male talker voice were associated with enhanced F0 tracking, consistent with their acoustic features. On the other hand, F0 tracking accuracy did not differ by listener sex. These findings provide new insights into the use of FFR as a neural measure for prosody assessment.
Karimi-Boroujeni et al. (Wed,) studied this question.