October 1, 2024Open Access

A dataset of the Chinese-Mongolian bilingual question-answer corpus in the legal field

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

With the development of large model technology, intelligent question-answering system is more and more widely used in people’s work and life. However, due to the limited data resources, intelligent question answering systems of low-resource languages like Mongolian can not fully meet the application needs of users. Based on the existing Chinese question and answer (Q&A) corpus, this study constructs 50,000 pairs of Chinese-Mongolian bilingual Q&A corpus data with corresponding classification labels through the steps of rule screening, Chinese-Mongolian translation and manual correction. This dataset can provide researchers with rich and accurate question-answering samples for training and evaluating the performance of intelligent question-answering systems, as well as for tasks such as machine translation and text classification. The manual evaluation verifies that 92% of the corpus conforms to the Q&A in the field of Chinese-Mongolian bilingual law. Therefore, the dataset holds significant value for advancing the research on the intelligent question-answering of various languages, including Chinese and Mongolia.

Me gusta

Guardar

Ver artículo completo