February 20, 2023

Parallel Corpus Curation for Filipino Text-to-SQL Semantic Parsing

Key Points

Key points are not available for this paper at this time.

Abstract

Text-to-SQL models were developed over the years to allow non-technical users to interact with relational databases. Deep learning approaches require large amounts of labeled data, but the majority of the available datasets used today for natural language processing task are in English. These make text-to-SQL semantic parsing in Filipino a promising yet challenging endeavor. This research presents the iTanong corpus-a hand-labeled parallel semantic parsing corpus for Filipino Text-to-SQL tasks. The frequent code-switching or the practice of alternating between two or more languages or varieties of language in conversation and written text poses another challenge in semantic parsing for the Filipino language. The iTanong corpus contains 16,113 Filipino question and SQL pairs from two institutional databases sourced from students and employees and curated by the research team. The researchers employed Part-of-Speech tagging to guide the annotation process and analyze the various structure of the natural language queries. The usability of the corpus is tested with GPT-3 with 1,150 question-SQL pairs and achieved an execution and exact-match accuracy of 87.4% and 89.8%, respectively.

AIに質問

Bookmark

AIに質問

Bookmark

Parallel Corpus Curation for Filipino Text-to-SQL Semantic Parsing

Key Points

Abstract

Cite This Study