Key points are not available for this paper at this time.
Full-image dependencies provide useful contextual information to benefit visual understanding problems. In this work, we propose a Criss-Cross Network (CCNet) for obtaining such contextual information in a more effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module in CCNet harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies from all pixels. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block in computing full-image dependencies. 3) The state-of-the-art performance. We conduct extensive experiments on popular semantic segmentation benchmarks including Cityscapes, ADE20K, and instance segmentation benchmark COCO. In particular, our CCNet achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results. The source code is available at https://github.com/speedinghzl/CCNet.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zilong Huang
Xinggang Wang
Lichao Huang
University of Illinois Urbana-Champaign
Huazhong University of Science and Technology
Horizon Robotics (China)
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6942daf4ca2dd862627d75d0 — DOI: https://doi.org/10.1109/iccv.2019.00069
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: