Privacy-Preserving Federated Action Recognition via Differentially Private Selective Tuning and Efficient Communication

cs.CV Idris Zakariyya, Pai Chet Ng, Kaushik Bhargav Sivangi, S. Mohammad Sheikholeslami, Konstantinos N. Plataniotis, Fani Deligianni · Mar 22, 2026

What it does

Why it matters

33$). The work matters for enabling practical privacy-preserving video analysis in healthcare and surveillance where data cannot be centralized.

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Federated video action recognition faces a dual challenge: gradient sharing risks leaking sensitive motion patterns, while synchronizing high-dimensional video models incurs prohibitive bandwidth costs. This paper proposes FedDP-STECAR, which selectively fine-tunes only task-relevant layers under differential privacy and transmits only those layers, claiming over 99% communication reduction alongside strong privacy guarantees ($\epsilon \leq 1.33$). The work matters for enabling practical privacy-preserving video analysis in healthcare and surveillance where data cannot be centralized.

Critical review

Verdict

Bottom line

The paper presents a pragmatic combination of selective fine-tuning, differential privacy, and top-level sampling (TLS) for federated video recognition. While the communication savings are compelling and the aggregation-agnostic design is welcome, the evaluation relies on small-scale setups (2–20 clients) on a single dataset, and key comparative claims are predicated on a weak baseline—full-model fine-tuning under tight DP budgets—which naturally collapses in accuracy. A typographical error in Algorithm 1 ("FedDP-SsTEER" instead of "FedDP-STECAR") and inconsistent use of privacy accounting methods further detract from the presentation.

“FedDP-SsTEER”

paper · Algorithm 1 caption

What holds up

The selective tuning strategy (Eq. 8–10) effectively confines DP noise to high-impact layers, yielding substantial bandwidth reduction. The authors report that "communication traffic is reduced by over $99\%$ compared to full-model updates" by transmitting only the selectively tuned subset $\theta_t$, and maintain $\sim$73% accuracy at $\epsilon=1.33$ where full tuning drops below 23% (Table 2). The code release and compatibility with multiple aggregators (FedAvg and FedNova) strengthen the practical contribution.

“communication traffic is reduced by over 99% compared to full-model updates”

paper · Abstract

“communication traffic per client per round drops from 1456 MB to just 3.1 MB—a reduction of 99.8%”

paper · Section 5.1

Main concerns

The claim that selective tuning achieves "up to $70.2\%$ higher accuracy" under strict privacy ($\epsilon=0.65$) is misleadingly phrased; it reflects an absolute accuracy gap against a baseline that cripples itself under strong DP noise, not a robust relative improvement. The paper asserts applicability to "non-IID federated environments" but provides no heterogeneity metrics (e.g., Dirichlet $\alpha$) or evidence that UCF-101 was partitioned non-IID. The privacy cost in Eq. 4 and 11 uses a loose asymptotic bound ($\varepsilon_{\text{priv}} = \frac{q}{\sigma}\sqrt{2E\ln(1/\delta)}$) rather than tight Gaussian-DP composition (e.g., via PLD or FFT-based accounting), potentially overstating the true privacy guarantee.

“At $\epsilon=0.65$, DP-Adam-SelTLS achieves $70.21\%$ higher accuracy than DP-Adam-FT”

paper · Section 5.2

“the approximate privacy cost can be expressed as: $\varepsilon_{\text{priv}}(E,q,\sigma,\delta)=\frac{q}{\sigma}\sqrt{2E\ln(1/\delta)}$”

paper · Section 2.1

Evidence and comparison

Evidence is limited to UCF-101 with 2–20 clients, a scale insufficient to validate scalability claims for realistic cross-device or cross-silo deployments. Comparisons are restricted to full fine-tuning under DP, omitting standard parameter-efficient baselines such as LoRA, adapters, or BitFit that could achieve similar communication savings without DP-specific tuning. The aggregation-agnostic claim is supported only by FedAvg and FedNova, leaving out more recent methods like FedProx or Scaffold that explicitly handle heterogeneity.

Reproducibility

The authors provide a GitHub link (https://github.com/izakariyya/mvit-federated-videodp), but critical hyperparameters—such as the learning rate, batch size per client, and the exact criterion for selecting the trainable subset $\theta_t$—are omitted from the manuscript. Hardware specifications (GPU type, memory), random seeds, and the precise train/validation splits for the 80–20 partition are not reported. Without these details, independent reproduction of the claimed 48% runtime improvement and 70.2% accuracy gains is not possible.

“Code available at https://github.com/izakariyya/mvit-federated-videodp”

paper · Abstract

Abstract

Federated video action recognition enables collaborative model training without sharing raw video data, yet remains vulnerable to two key challenges: \textit{model exposure} and \textit{communication overhead}. Gradients exchanged between clients and the server can leak private motion patterns, while full-model synchronization of high-dimensional video networks causes significant bandwidth and communication costs. To address these issues, we propose \textit{Federated Differential Privacy with Selective Tuning and Efficient Communication for Action Recognition}, namely \textit{FedDP-STECAR}. Our \textit{FedDP-STECAR} framework selectively fine-tunes and perturbs only a small subset of task-relevant layers under Differential Privacy (DP), reducing the surface of information leakage while preserving temporal coherence in video features. By transmitting only the tuned layers during aggregation, communication traffic is reduced by over 99\% compared to full-model updates. Experiments on the UCF-101 dataset using the MViT-B-16x4 transformer show that \textit{FedDP-STECAR} achieves up to \textbf{70.2\% higher accuracy} under strict privacy ($\epsilon=0.65$) in centralized settings and \textbf{48\% faster training} with \textbf{73.1\% accuracy} in federated setups, enabling scalable and privacy-preserving video action recognition. Code available at https://github.com/izakariyya/mvit-federated-videodp

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.