What question did this study set out to answer?

The research aims to improve unconstrained fall detection using a novel dataset and an advanced vision-language model.

March 26, 2026

Towards Unconstrained Fall Detection Using Vision Language Model: Dataset, Theory and Practices

Key Points

The research aims to improve unconstrained fall detection using a novel dataset and an advanced vision-language model.
Introduction of HUST-FALL, a text-video dataset with diverse fall scenarios
Development of Action-R1, a lightweight model using textual guidance
Evaluation through cross-dataset tests against traditional approaches
Action-R1 achieved an average F1 score of 0.827 on three benchmarks
Performance exceeded CNN/RNN-based methods with 1/16 the parameters
Outperformed MiniCPM-V 2.6, surpassing on UPFall by 116.22%

Abstract

Unconstrained fall detection is essential for real-world applications. However, it remains underexplored due to the scarcity of real-world fall data and the limited generalization ability of existing methods. To address these challenges, we first introduce HUST-FALL, a fine-grained text-video dataset for unconstrained fall detection, featuring diverse fall scenarios and rich semantic annotations. Building on this dataset, we propose Action-R1, a lightweight vision-language model that leverages structured textual guidance and reasoning to improve the understanding of fall events. In challenging cross-dataset tests, Action-R1 achieves an average F1 score of 0.827 on three benchmarks, significantly outperforming conventional CNN/RNN-based methods. Despite having only 1/16 the parameters, Action-R1 achieves competitive performance against MiniCPM-V 2.6, even surpassing it on UPFall by 116.22%. These results demonstrate that Action-R1 is a lightweight yet powerful solution for unconstrained fall detection in real-world scenarios.

Bookmark

Towards Unconstrained Fall Detection Using Vision Language Model: Dataset, Theory and Practices

Key Points

Abstract

Cite This Study