Key points are not available for this paper at this time.
Legal artificial intelligence (LegalAI), aiming to benefit the legal domain using artificial intelligence technologies, is the hot topic of the moment. As the basis for various LegalAI tasks such as judgment prediction and similar case matching, the classification of legal documents is an issue that has to be addressed. The majority of current approaches focus on the legal systems of native English-speaking countries. However, both Chinese language and legal system differ significantly from that of English. Given the success of pre-trained Language Models (PLMs) and outperformance compared with feature-engineering-based machine learning models as well as traditional deep neural network models such as CNNs and RNNs in NLP, their effectiveness in specific domains needs to be further investigated, especially in legal domain. Moreover, few studies have made comparisons of these PLMs for specific legal tasks. Therefore, in this paper we train several strong PLMs which differ in pre-training corpus on three datasets of Chinese legal documents. Experimental results show that the model pre-trained on the legal corpus demonstrates its high efficiency on all datasets.
Qin et al. (Fri,) studied this question.