Key points are not available for this paper at this time.
In this paper we propose a convolutional neural network (CNN), which allows to identify corresponding patches of very high resolution (VHR) optical and SAR imagery of complex urban scenes. Instead of a siamese architecture as conventionally used in CNNs designed for image matching, we resort to a pseudo-siamese configuration with no interconnection between the two streams for SAR and optical imagery. The network is trained with automatically generated training data and does not resort to any hand-crafted features. First evaluations show that the network is able to predict corresponding patches with high accuracy, thus indicating great potential for further development to a generalized multi-sensor matching procedure.
Mou et al. (Wed,) studied this question.