User-generated visual content (UGC) now occupies a significant fraction of internet traffic, and billions of UGC videos and pictures are uploaded daily. Among these, short-form video content now accounts for most of the videos consumed by online users. Given the popularity of short-form UGC content, being able to control the perceptual quality of UGC videos has emerged as an important problem. Visual UGC is subject to myriad types, severity, and combinations of distortions. While UGC video quality has been closely studied, the quality and legibility of text that is overlaid or embedded in short-form UGC videos has received relatively low attention. However, being able to accurately predict text quality in images is important, since it both impacts the overall perception of the content it is embedded in, as well as the messages being conveyed. It is also beneficial for applications involving image or video text recognition which can affect visual search and content identification. Analyzing the quality of text embedded in pictures or videos is a hard problem, since perception of it is commingled with the surrounding visual content. Our work, which greatly extends our early report on text legibility prediction, contributes to both the psychophysics of embedded text quality as well as to computational models of its perception. We have created two subjective datasets - designated as the LIVE-COCO Text Legibility (LIVE-COCO-TL) Database (a modification of COCO-Text), and the LIVE-YouTube Text-in-Video Quality (LIVE-YT-TVQ) Database. LIVE-COCO-TL contains 74, 440 text patches with legibility annotations, while LIVE-YT-TVQ contains ∼ 19K subjective quality ratings on 405 videos and 641 text patches extracted from them. We build models that predict embedded or overlaid text legibility and text quality, as well as a multi-task model that simultaneously predicts the overall quality of videos with embedded or overlaid and local text quality. We are making the databases and all models freely available at https: //live. ece. utexas. edu/research/LIVE YouTubeText Quality Assessment/index. html.
Mandal et al. (Thu,) studied this question.