In this paper, we tackle the open-set temporal action segmentation task, which aims to identify unknown frames while ensuring accurate segmentation of known actions in the temporal domain. Existing open-set methods struggle with identifying unknown frames due to their indistinguishability against ambiguous known frames during action transitions, resulting in significant performance degradation. To address this, we propose the action distribution flow, which models transitions between action sequences to capture the inherent feature discrepancies between unknown and known frames. Specifically, our method first models the distributions of known actions using the training data, and then interpolates these distributions along the optimal transport path for consecutive actions in the testing videos. By evaluating the likelihood of testing frames against the modeled action distribution flow, our approach effectively identifies unknown frames without requiring additional training or prior knowledge of the unknown data. Extensive experiments on open-set versions of the GTEA, 50Salads, and Breakfast datasets demonstrate the superiority of the proposed method across all evaluation metrics.
Zhang et al. (Thu,) studied this question.