What question did this study set out to answer?

The research aims to enhance on-device human action recognition while preserving user privacy.

April 12, 2026Open Access

MAD‐HAR: Privacy‐Preserving On‐Device Human Action Recognition via Multiagent LLM Debate

Key Points

The research aims to enhance on-device human action recognition while preserving user privacy.
Introduced MAD-HAR framework using a vision-language model for local processing
Implemented a multiagent debate system with 7 agents for improved reasoning
Utilized structured rationales instead of simple labels to justify outputs
Conducted preliminary, main, and ablation studies to optimize model performance
Demonstrated significant improvement in macro-F1 score
Achieved maximized consensus among agents
Showed consistent rectification of net errors through collaborative critique

Abstract

Human action recognition (HAR) is a fundamental component of ubiquitous computing, yet its wide‐range applications are hindered by privacy concerns. Specifically, high‐accuracy models typically require cloud‐based processing that compromises sensitive visual data, while privacy‐preserving on‐device models suffer from limited reasoning capacities and frequent hallucinations. To resolve this conflict, we introduce multiagent debate for HAR (MAD‐HAR), a novel framework designed for strictly local environments. MAD‐HAR leverages a lightweight vision–language model (VLM) with a granular prompt to convert visual inputs into semantic captions, anonymizing data before inference. To mitigate reasoning failures, a heterogeneous ensemble of diverse small and medium language model agents (ranging from 8B to 14B parameters) engages in a structured multiround debate. Rather than outputting simple labels, agents are prompted to generate structured rationales to explicitly justify their logic, utilizing collaborative critique to override hallucinations. We evaluate our approach on public benchmarks. Preliminary experiments guided the selection of the optimal VLM backbone, while extensive main and ablation studies suggest that scaling to a seven‐agent pool with rationale‐driven debate synthesizes higher‐order reasoning. Experimental results show that MAD‐HAR significantly improves macro‐F1, while maximizing consensus and yielding consistent net error rectification.

MAD‐HAR: Privacy‐Preserving On‐Device Human Action Recognition via Multiagent LLM Debate

Key Points

Abstract

Cite This Study