Learn To Pay Attention

Key Points

Key points are not available for this paper at this time.

Abstract

We propose an end-to-end-trainable attention module for convolutional neural network (CNN) architectures built for image classification. The module takes as input the 2D feature vector maps which form the intermediate representations of the input image at different stages in the CNN pipeline, and outputs a 2D matrix of scores for each map. Standard CNN architectures are modified through the incorporation of this module, and trained under the constraint that a convex combination of the intermediate 2D feature vectors, as parameterised by the score matrices, must alone be used for classification. Incentivised to amplify the relevant and suppress the irrelevant or misleading, the scores thus assume the role of attention values. Our experimental observations provide clear evidence to this effect: the learned attention maps neatly highlight the regions of interest while suppressing background clutter. Consequently, the proposed function is able to bootstrap standard CNN architectures for the task of image classification, demonstrating superior generalisation over 6 unseen benchmark datasets. When binarised, our attention maps outperform other CNN-based attention maps, traditional saliency maps, and top object proposals for weakly supervised segmentation as demonstrated on the Object Discovery dataset. We also demonstrate improved robustness against the fast gradient sign method of adversarial attack.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Saumya Jetley

Université Paris-Saclay

Nicholas A. Lord

University of Florida Health

Namhoon Lee

Anyang University

Actions

Institutions

University of Oxford

Carnegie Mellon University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Jetley et al. (Fri,) studied this question.

synapsesocial.com/papers/6a109ab157bfcc7264600963 — DOI: https://doi.org/10.48550/arxiv.1804.02391

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Human action recognition by learning bases of action attributes and parts· 2011 · 636 citations
Caltech-256 Object Category Dataset· 2007 · 2,396 citations
Reading digits in natural images with unsupervised feature learning· 2024 · 4,565 citations
DeepMask: Masking DNN Models for robustness against adversarial samples.· 2017 · 10 citations
Recurrent Models of Visual Attention· 2014 · 1,002 citations

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Human action recognition by learning bases of action attributes and parts· 2011 · 636 citations
Caltech-256 Object Category Dataset· 2007 · 2,396 citations
Reading digits in natural images with unsupervised feature learning· 2024 · 4,565 citations
DeepMask: Masking DNN Models for robustness against adversarial samples.· 2017 · 10 citations
Recurrent Models of Visual Attention· 2014 · 1,002 citations

Learn To Pay Attention

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider