Intermediate flow estimation is an important part of video frame interpolation (VFI). Most previous works use interpolation to derive the intermediate flow assuming localized linear motion. However, this method is not effective when dealing with extreme motions. In this work, we assume that the motion trajectory of an object is determined by the appearance characteristics of this object. Based on this assumption, we propose a new intermediate flow estimation method, which obtains the motion features of intermediate frames from image appearance and inter-frame motion features. In addition, in order to fully extract the inter-frame features, we rethink the difference of VFI and previous works on using Swin-Transformer and compute the appearance features and motion features within the adaptive neighborhood by cyclically shifting the window. Experimental results show that our method achieves state-of-the-art performance on different datasets for both fixed-time and arbitrary-time interpolation. Moreover, our proposed method outperforms models that require inputting a sequence of four frames when handling videos with extremely large motion. The source code is available from https://github.com/chen12304/IFE-VFI.
Chen et al. (Thu,) studied this question.