MILU provides the first comprehensive benchmark for evaluating structured understanding of scientific lecture slides. The results show that current VLMs achieve high formatting reliability but low semantic consistency. MILU establishes a foundation for future expert-annotated benchmarks, diagram- and math-aware modeling, and improved methods for scientific lecture interpretation.
Manik et al. (Wed,) studied this question.