What question did this study set out to answer?

The aim is to evaluate the feasibility of automating the identification of key medical interventions during mass casualty simulations using video analysis.

March 26, 2026

1671: Automating Identification of Successful Field Intervention in Mass Casualty Training: Video Pipeline

Key Points

The aim is to evaluate the feasibility of automating the identification of key medical interventions during mass casualty simulations using video analysis.
Developed a framework using the Autodoc simulation dataset.
Analyzed the first 60 seconds of videos at 1 frame per second.
Utilized models like MediaPipe Pose and YOLO Nano X for intervention detection.
Employed the SNORKEL framework for weak supervision in label generation.
Applied RGB thresholds for detecting blood presence as cues for interventions.
MediaPipe model achieved nearly 80% accuracy in detecting interventions.
Probabilistic SNORKEL ensemble reached 65% accuracy for negative class identification.
YOLO X and zero-shot models showed moderate performance across both classes.

Abstract

Introduction: In mass casualty scenarios, patient outcomes are critically dependent on the speed, accuracy, and consistency of medical interventions. The timely delivery of care is paramount to effective crisis management. Evaluation during simulation training and real world scenarios is marred by Hawthorne effect and feasibility of evaluating, leading to design bias. A unmanned video recording may be a viable solution, which combined with computer vision algorithms offer a pathway to automate analysis for objectively assess performance, improving training and capacity building. We explore the feasibility of using few-shot learning to automate identification of key interventions in mass casualty scenarios Methods: Using the Autodoc simulation dataset, we developed a framework for detecting tourniquet application, an early, rapid and frequent intervention identified through exploratory analysis. The first 60 seconds of simulation videos at 1fps were analysed. Zero shot learning using models including You Only Look Once (YOLO) Nano X for general scene and object detection, Mediapipe Pose to analyze physical interactions between the caregiver/mannequin, and generalist models like the Bidirectional Encoder Representations from Transformers (BERT) were combined with the SNORKEL framework to apply weak supervision, generating probabilistic binary labels from a small number of expert-annotated clips. RGB thresholds were applied to provide cues for the visual detection of blood, a key determinant of tourniquet application. A binary outcome regarding application of tourniquet was considered Results: The MediaPipe model alone demonstrated the highest performance at correctly recognising when an intervention is occurring with almost 80% accuracy. For negative class identification, the probabilistic SNORKEL ensemble was the most proficient (65% accuracy). The YOLO X and zero-shot models showed more moderate, balanced performance across both classes. Conclusions: The incorporation of spatial anatomical cues improves the automated detection of interventions in mass casualty simulations, considerably improving performance with zero shot models. Including multi-label and continuous temporal segmentation in the future, may provide comprehensive analysis of crisis management.

Bookmark

Cite This Study

Garg et al. (Sun,) studied this question.

synapsesocial.com/papers/69c4ccebfdc3bde4489188fa https://doi.org/https://doi.org/10.1097/01.ccm.0001188680.44217.28

Bookmark