February 2, 2022

ChineseTR: A weakly supervised toponym recognition architecture based on automatic training data generator and deep neural network

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract Toponym recognition is used to extract toponyms from natural language texts, which is a fundamental task of ubiquitous geographic information applications. Existing toponym recognition methods with state‐of‐the‐art performance mainly leverage supervised learning (i.e., deep‐learning‐based approaches) with parameters learned from massive, labeled datasets that must be annotated manually. This is a great inconvenience when model training needs to fit different domain texts, especially those of social media messaging. To address this issue, this article proposes a weakly supervised Chinese toponym recognition (ChineseTR) architecture that leverages a training dataset creator that generates training datasets automatically based on word collections and associated word frequencies from various texts and an extension recognizer that employs a basic bidirectional recurrent neural network based on particular features designed for toponym recognition. The results show that the proposed ChineseTR achieves a 0.76 F1 score in a corpus with a 0.718 out‐of‐vocabulary rate and a 0.903 in‐vocabulary rate. All comparative experiments demonstrate that ChineseTR is an effective and scalable architecture that recognizes toponyms.

KI fragen

Bookmark