Towards Testing the Accessibility of Dynamic Visual Changes in Android Mobile GUI with Multi-Modal LLMs

Key Points

VisualDroid achieved a high F1-score of 94.7% for classifying visual changes in GUIs across 34 apps.
The method successfully identified and helped resolve three significant accessibility issues for blind users.
Evaluation included both proprietary and open-source apps, ensuring broader applicability of the findings.
Considering efficiency, the approach shows minimal resource consumption, offering a cost-effective solution.

Abstract

User interactions with mobile applications (apps) are accompanied by continuous visual changes in the Graphical User Interface (GUI), guiding task completion and feedback. These changes help users complete intended tasks or assess the appropriateness of their actions, typically conveyed through visual cues such as appearance and color. While such visual changes are effective for sighted users, they are inaccessible to blind users, creating substantial barriers to GUI interaction. To address these challenges, we propose VisualDroid , a method based on a multi-modal large language model (LLM) for testing and classifying GUI visual changes using a tailored three-hop reasoning prompting framework. VisualDroid achieved an F1-score of 94.7% in 34 apps from 17 domains, surpassing all baseline methods. When evaluated on five open-source apps from F-Droid, our method enabled developers to resolve three identified issues, with two still under review. In terms of efficiency and cost, our method indicates minimal resource consumption.

Bookmark

Towards Testing the Accessibility of Dynamic Visual Changes in Android Mobile GUI with Multi-Modal LLMs

Key Points

Abstract

Cite This Study