The rapid integration of large multimodal foundation models (LMFMs) such as GPT-4V, Gemini, and Claude 3 into critical production systems—spanning autonomous vehicles, clinical diagnostics, and financial analytics—has unveiled a profound and escalating security vulnerability: their susceptibility to sophisticated adversarial attacks. This discourse provides a rigorous, engineering-centric analysis of adversarial machine learning within the unique and complex threat landscape of production-scale multimodal AI. It systematically deconstructs the expanded attack surface, where vulnerabilities are not merely additive but synergistic, arising from the very cross- modal fusion mechanisms that constitute these models' core capability. The analysis delves into advanced methodological archetypes, including gradient-based white-box attacks on vision encoders, query-efficient black-box strategies leveraging surrogate models and transferability, and the particularly insidious class of cross-modal co-adaptive attacks—such as "image hijacking"— that strategically distribute perturbations across vision and language to maliciously steer model behavior. Concurrently, the paper examines the critical imperative for moving beyond empirical defenses like adversarial training toward formally verified, certified robustness.
Aditiya Widodo Putra (Wed,) studied this question.