Event cameras are bio-inspired vision sensors offering low latency, low power consumption, and high dynamic range, capturing motion with microsecond-level precision via a per-event triggering mechanism. Despite these advantages, the inherent sparsity and lack of color in event data hinder direct analysis, necessitating advanced deep learning approaches. To achieve low-latency and high-precision motion segmentation for indoor robotic applications, this paper introduces a dual-branch decoupled CNN framework. Specifically, Principal Component Analysis (PCA) is utilized to project 3D event point clouds into 2D motion trend maps, capturing local motion priors while suppressing ambiguity in structured environments. Concurrently, an Event Leaky Integration (ELI) model, inspired by biological membrane potentials, is designed to enhance the structural representation of sparse events. Within this framework, separate branches respectively perform motion validation and shape extraction and are fused via a Spatial Gated Fusion (SGF) module to suppress static background interference. It is demonstrated experimentally that with an input window of only 10 ms, the proposed method achieves a 77% average mIoU across five indoor test scenarios from the EV-IMO dataset with an inference latency of 10 ms per frame. Compared to state-of-the-art methods like MSRNN and GCN, which required 30–300 ms event slices, our framework achieves a favorable trade-off between computational efficiency and segmentation accuracy, maintaining competitive performance under ultra-short time windows for indoor event-based motion processing.
Yin et al. (Thu,) studied this question.