To address bounding-box merging, missed detections, and class confusion in complex document layouts, this study proposes YOLO-GFD, a lightweight document layout detection algorithm that balances global layout modeling and fine-grained feature representation. Built upon YOLO11n, the proposed method introduces an RMSNorm-optimized AIFI-Lite module at the high-semantic stage to enhance long-range dependency modeling with improved stability and parameter efficiency, incorporates an enhanced upsampling and reconstruction mechanism in the feature pyramid to better preserve edge and texture details, and employs a hybrid convolution–attention structure in the mid-scale branch to improve discrimination of adjacent regions. Experimental results show that, on the self-constructed ExamDoc-CN dataset, YOLO-GFD improves mAP@0.5 and mAP@0.5:0.95 by 1.3 and 2.8 percentage points over YOLO11n, respectively. On the CDLA and IIIT-AR-13K datasets, mAP@0.5 increases by 1.0 and 0.8 points, while mAP@0.5:0.95 improves by 1.8 and 0.4 points, respectively. These results demonstrate that YOLO-GFD achieves consistent performance gains across different document layout scenarios with only marginal computational overhead, indicating an effective trade-off between detection accuracy and efficiency.
Ju et al. (Sun,) studied this question.