Key points are not available for this paper at this time.
Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of 'risk' in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: (i) repeating the prompt under effectively identical conditions increases validity, (ii) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and (iii) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model's validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was
Building similarity graph...
Analyzing shared references across papers
Loading...
Tom Driessen
Delft University of Technology
Dimitra Dodou
Delft University of Technology
Pavlo Bazilinskyy
California Maritime Academy
Royal Society Open Science
Delft University of Technology
Eindhoven University of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Driessen et al. (Wed,) studied this question.
synapsesocial.com/papers/68e6c02bb6db64358763f5fa — DOI: https://doi.org/10.1098/rsos.231676
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: