What question did this study set out to answer?

The study aims to enhance cat behavior recognition in videos despite unstable visibility of body parts.

May 26, 2026Open Access

PMTNet: A Part-Centric Missing-Aware Temporal Network for Cat Behavior Recognition in Unconstrained Videos

Key Points

The study aims to enhance cat behavior recognition in videos despite unstable visibility of body parts.
Developed PMTNet, a part-centric temporal network for recognizing cat behavior.
Utilized a dataset with 4000 training images and 1283 video clips across five categories.
Evaluated the performance based on Top-1 Accuracy and Macro-F1 scores.
Achieved 93.1% Top-1 Accuracy and 90.9% Macro-F1 in the best-performing setting.
Outperformed existing end-to-end video recognition baselines.
Ablation studies indicated significant factors for performance included detector choice and missing-aware fusion.

Abstract

Cat behavior recognition in unconstrained videos is important for animal welfare monitoring and veterinary assessment, yet remains challenging because behavior cues are often carried by highly deformable and intermittently visible parts such as the head and tail. This study aims to improve clip-level cat behavior recognition under unstable part visibility in real-world videos. We propose PMTNet, a part-centric temporal network for cat behavior recognition under unstable part visibility. The framework first detects the cat body, head, and tail using a DEIM-based detector, then selects a detector according to video-domain continuity and stability, and finally models behavior from ROI appearance features and explicit geometric motion cues. The framework was developed and evaluated using a part-detection dataset of 4000 training images and 500 validation images, together with a cat behavior dataset of 1283 video clips across five categories. In the best-performing setting, PMTNet achieved 93.1% Top-1 Accuracy and 90.9% Macro-F1. Ablation studies further suggest that detector choice in the video domain, complementary part cues, and missing-aware fusion all contribute to the final recognition performance. On the present dataset, PMTNet also outperformed representative end-to-end video recognition baselines. These results support the use of part-centric temporal modeling for cat behavior recognition in unconstrained real-world videos.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Tu et al. (Sat,) studied this question.

synapsesocial.com/papers/6a1539ccb5d9c58d83e8cdbd https://doi.org/https://doi.org/10.3390/ani16111589

Bookmark

View Full Paper