Abstract Artificial Intelligence (AI) has rapidly emerged as a transformative force in language proficiency testing and assessment. Traditional testing models face persistent challenges such as high costs, inter-rater variability, long scoring timelines, and limitations in scalability. AI-driven systems offer solutions by enabling automated scoring, adaptive test design, real-time feedback, and large-scale deployment. This paper critically explores the integration of AI into language proficiency assessment, emphasizing five key pillars: validity, reliability, fairness, transparency, and practicality. It reviews the evolution from early automated scoring methods based on surface features to advanced systems leveraging large language models (LLMs) and end-to-end speech recognition models. Key applications are examined across receptive in LSRW skills. The paper highlights challenges including bias detection, fairness across demographics and linguistic backgrounds, data privacy, and the interpretability of AI-generated scores. Methodological frameworks such as psychometric analysis, generalizability theory, and human–AI concordance measures are discussed as tools to ensure robustness. Furthermore, it underscores the pedagogical and ethical implications of AI-enabled assessment, particularly the washback effect on teaching and learning practices. By proposing a modular architecture for AI-based testing and offering governance guidelines, the study provides a roadmap for educational institutions and policymakers. The conclusion emphasizes that AI should not replace human judgment but should be deployed as a complementary tool under responsible governance. When implemented effectively, AI can enhance efficiency, consistency, and accessibility in language assessment, while safeguarding fairness and fostering positive educational outcomes. Keywords: Artificial Intelligence, Language Assessment, Automated Scoring, Validity, Reliability, Fairness, Transparency, Generative AI, Educational Measurement, Language Testing, Ethics.
Munianjinappa et al. (Mon,) studied this question.