What question did this study set out to answer?

This study aims to compare different OCR architectures for recognizing Korean license plates under identical conditions.

February 14, 2026Open Access

A Comparative Study of OCR Architectures for Korean License Plate Recognition: CNN–RNN-Based Models and MobileNetV3–Transformer-Based Models

Key Points

This study aims to compare different OCR architectures for recognizing Korean license plates under identical conditions.
Utilized a unified YOLOv12-based license plate detector for fair comparison.
Evaluated CNN with Attention-LSTM and MobileNetV3 with Transformer decoders.
Conducted controlled ablation study fixing CNN backbone to ResNet-18.
Performed experiments on both static image datasets and sequential datasets.
Assessed recognition accuracy, error characteristics, and processing speed.
Effectiveness of sequence decoders varies by dataset and feature quality.
Tracking-induced error accumulation negatively impacts OCR performance in sequential scenarios.
Error patterns specific to Korean license plates were identified, not covered by generic OCR benchmarks.
Transformer-based models showed significant computational and memory overhead on embedded platforms.

Abstract

This paper presents a systematic comparative study of optical character recognition (OCR) architectures for Korean license plate recognition under identical detection conditions. Although recent automatic license plate recognition (ALPR) systems increasingly adopt Transformer-based decoders, it remains unclear whether performance differences arise primarily from sequence modeling strategies or from backbone feature representations. To address this issue, we employ a unified YOLOv12-based license plate detector and evaluate multiple OCR configurations, including a CNN with an Attention-LSTM decoder and a MobileNetV3 with a Transformer decoder. To ensure a fair comparison, a controlled ablation study is conducted in which the CNN backbone is fixed to ResNet-18 while varying only the sequence decoder. Experiments are performed on both static image datasets and tracking-based sequential datasets, assessing recognition accuracy, error characteristics, and processing speed across GPU and embedded platforms. The results demonstrate that the effectiveness of sequence decoders is highly dataset-dependent and strongly influenced by feature quality and region-of-interest (ROI) stability. Quantitative analysis further shows that tracking-induced error accumulation dominates OCR performance in sequential recognition scenarios. Moreover, Korean license plate–specific error patterns reveal failure modes not captured by generic OCR benchmarks. Finally, experiments on embedded platforms indicate that Transformer-based OCR models introduce significant computational and memory overhead, limiting their suitability for real-time deployment. These findings suggest that robust license plate recognition requires joint consideration of detection, tracking, and recognition rather than isolated optimization of OCR architectures.

Bookmark

View Full Paper

Bookmark

View Full Paper

A Comparative Study of OCR Architectures for Korean License Plate Recognition: CNN–RNN-Based Models and MobileNetV3–Transformer-Based Models

Key Points

Abstract

Cite This Study