The rapid proliferation of social media and generative large language models has increased multimodal harmful content, making harmful meme detection and explanation generation crucial for content moderation. In Chinese social media, meme harmfulness relies on implicit visual–textual interactions in cultural contexts, but existing research lacks a comprehensive understanding of such cultural specificity. This neglect of the social background knowledge and metaphorical expressions inherent in memes results in limited detection performance. To address this challenge, we propose a novel fine-grained explanation-enhanced Chinese harmful meme detection framework (FG-E2HMD), a framework using Multimodal Large Language Models (MLLMs) with a culturally aware explanation generation module to produce structured explanations, which integrate with multimodal features for decision-making. Comprehensive quantitative experiments and qualitative analyses were conducted on ToxiCN MM, the first large-scale dataset dedicated to Chinese harmful meme detection. The experimental results reveal that existing methods still have significant limitations in detecting Chinese harmful memes. Concurrently, our framework improves detection accuracy and decision transparency by incorporating explicit Chinese cultural background knowledge, paving the way for more intelligent, culturally adaptive content moderation systems.
Chen et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: