Key points are not available for this paper at this time.
In this article, we put forth a unique attempt to detect the local changes in challenging video scenes by exploring the capabilities of an encoder-decoder type network that employs a modified ResNet-152 architecture with a multi-scale feature extraction (MFE) framework. The proposed encoder network consists of a modified ResNet-152 network where the initial two blocks are freeze and the weights of the last blocks are learned using a transfer learning mechanism. The said encoder can reduce the computational complexity and extract fine as well as coarse-scale features. We have proposed an MFE mechanism block which is a hybridization of pyramidal pooling architecture (PPA), and various atrous convolutional layers where the high-level features from the encoder network are utilized to extract multi-scale features. The use of PPA in the MFE block preserves maximum value in every pooling area, to retain the contextual relationship between the pixels in the complex video frames that can handle various challenging scenes. The proposed decoder network consists of stacked transposed convolution layers that learn a mapping from feature space to image space, predicting a score map. Then, a threshold is applied on the score map to get the binary class labels as the background and foreground. The performance of the proposed scheme is validated by testing it against 31 state-of-the-art techniques. The results obtained by the proposed method are corroborated qualitatively as well as quantitatively. Further, the efficacy of the proposed algorithm is verified with an unseen video setup and is found to provide better performance.
Panda et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: