Script identification is the first step in most multilingual text-processing systems. To improve the time efficiency of script identification algorithms, whether there is content written in a certain script in the text is first determined; if so, the content written in that script is then obtained. Then, it is determined whether the total length of the texts corresponding to the identified scripts is equal to the original text length; if so, the script identification process ends. Finally, considering the frequencies of various scripts on the Internet, those that are more common are prioritized during script identification. Based on these three approaches, an improved script identification algorithm was designed. A comparison experiment was conducted using sentence-level text corpora in 263 languages written in 26 scripts. The testing times of the newly proposed method were reduced by 9.35-fold, while the F1 score for script identification was slightly higher than those reported in our earlier studies. The method proposed in this study effectively improves the time efficiency of script identification algorithms.
Qasim et al. (Mon,) studied this question.