Key points are not available for this paper at this time.
In recent years, Large Language Models such as Generative Pre-trained Transformers (GPT) have revolutionized Artificial Intelligence (AI). Such models are extremely successful in imitating general intuitive knowledge. Such knowledge is applied by humans in assessing the urban sound environment in different ways. Firstly, it helps to create expectations on the sound environment based on general knowledge of the place. Secondly, it allows to assess the plausibility and consistency of the verbal description of a sound environment. Hence, we propose to combine a GPT with a sound event and scene recognition AI to (1) contrast the recognized sounds with expectations based on geographical information on traffic infrastructure and points of interest near the measurement location; (2) create verbal soundscape annotations including perceived liveliness, calmness, etc. Prompt engineering, that is pre-conditioning and asking precise questions to the GPT, requires some domain knowledge and precise definition of the objective. Results show that labeling of sound events is improved, and in particular labels that would not be used by a human can be excluded by including contextual knowledge on the location with GPT. They also show that a plausible soundscape description is obtained.
Botteldooren et al. (Fri,) studied this question.