Universal Adversarial Attack on Aligned Multimodal LLMs
Temurbek Rahmatullaev, Polina Druzhinina, Matvey M...
The paper presents a novel attack method that uses a single optimized image to bypass safety measures in multimodal LLMs, forcing them to generate harmful responses across different prompts and models