Under-resourced languages (and musics) pose a challenge to machine translation (MT). The challenge is greater when the content of the collected dataset is a varied sample taken from a data population that is even more diverse and dynamic. This is the challenge of Arab music vocal improvisation (mawwal). Here, we present the development of AMICOR, a parallel dataset consisting of vocal improvisatory phrases and their corresponding instrumental responses (or tarjamat in Arabic, which literally means “translations”) in the mawwal tradition. These melodic phrases are handled as “sentences” from the viewpoint of natural language. When developing the dataset, we integrated musicological insights in order to evaluate music theoretical differences between sub-datasets, primarily regarding their size, sentence length, performance quality, and shared musical identity. We then experimented with MT to generate instrumental responses to new vocal sentences, comparing several translation modeling configurations that differ (1) in translation approach (Neural MT (NMT) versus Statistical MT (SMT)), and (2) in the dataset handling approach in respect to the maqam (an Arabic musical term referring roughly to a melodic mode), comparing an individual model for each maqam versus a unified model for all maqamat. We found that merging related sub-datasets does not necessarily lead to better results, and may even favor simpler and shorter sentences with lower performance quality and less sophisticated patterns. This issue applies to both NMT and SMT; however, it is greater for NMT. A comparison of confusion matrices of individual-maqam models suggested that, in such a small dataset, the gap between SMT and NMT performance increases further if the styles, or skills, of potential users differ from those who built the dataset used in the training. Our discussion asserts that key factors in system design are the musical background and performance decisions of vocalists who may use such responsive generative models, as well as dataset size and performance quality.
Al-Ghawanmeh et al. (Fri,) studied this question.