Knowledge Priors for Identity-Disentangled Open-Set Privacy-Preserving Video FER
This paper addresses privacy-preserving facial expression recognition (FER) in video without requiring identity labels—a critical gap since real-world deployment often lacks identity annotations. The core idea leverages intra- and inter-video knowledge priors to train an identity suppression network followed by a denoising module, enabling open-set privacy preservation. This matters because current methods either require closed-set identity supervision or suffer from entangled privacy-utility trade-offs that degrade performance.
The paper presents a well-motivated approach to a practical problem, offering a novel framework that decouples privacy and utility through a two-stage training process. The introduction of a falsification-based validation protocol without identity labels addresses a significant evaluation gap in open-set scenarios. However, while the method shows promise on in-the-wild data, the theoretical foundations of why the denoising module improves privacy remain underdeveloped, and some experimental comparisons conflate training on source data with zero-shot transfer to target datasets.
The knowledge prior extraction strategy using temporal tracking for pseudo-labels is technically sound and robust to identity switches since triplets are sampled from different videos. The falsification-based validation protocol (Rules 4-7 in Table II) is a creative solution for evaluating privacy without ground-truth identity labels, achieving $P_{pre}$ of 0.9620 on DFEW compared to 0.6843 for adversarial baselines. The ablation study demonstrating that removing expression labels from batch construction degrades performance validates the importance of expression-aware sampling.
The privacy preservation ratio $P_{pre}$ calculation excludes cases where different identities might be incorrectly matched (focusing only on falsified cases), potentially overstating privacy guarantees. The paper explicitly acknowledges that the denoising module's theoretical foundations "require further study," yet still claims it improves privacy preservation without mechanistic justification. Additionally, the comparison with closed-set baselines on CREMA-D and RAVDESS is confounded by the fact that closed-set methods retrain on these target datasets while the proposed method transfers from DFEW, yet performance gaps are attributed solely to the closed vs. open-set distinction.
The evidence supports the claim that the method achieves comparable FER accuracy to baselines while providing stronger privacy protection in open-set settings, with FER accuracy of 36.54% on DFEW versus 32.44% for adversarial approaches (Table IV). However, the SSIM/PSNR metrics in Table VI show the method falls short of closed-set approaches (Contr-HL), contradicting the claim of surpassing "all open-set approaches" while showing gaps to closed-set methods. The falsification-based validation is novel but lacks comparison against standard re-identification attacks that might reveal vulnerabilities not captured by the binary matching classifier $f_{match}$.
The paper provides comprehensive implementation details including specific hyperparameters ($\alpha=0.01$, 400 epochs, batch size 1024), pre-trained backbones (RetinaFace, ArcFace), and dataset splits (Section IV-A, VII-B). However, no code repository or data release is mentioned, which would be critical for reproducing the complex multi-stage pipeline involving tracking, privacy preservation, denoising, and falsification-based validation. The reliance on specific random seeds (fixed at 42) is mentioned, but the exact data preprocessing flow and triplet sampling strategy require careful reconstruction from Algorithm 1 and Section III-B.
Facial expression recognition relies on facial data that inherently expose identity and thus raise significant privacy concerns. Current privacy-preserving methods typically fail in realistic open-set video settings where identities are unknown, and identity labels are unavailable. We propose a two-stage framework for video-based privacy-preserving FER in challenging open-set settings that requires no identity labels at any stage. To decouple privacy and utility, we first train an identity-suppression network using intra- and inter-video knowledge priors derived from real-world videos without identity labels. This network anonymizes identity while preserving expressive cues. A subsequent denoising module restores expression-related information and helps recover FER performance. Furthermore, we introduce a falsification-based validation method that uses recognition priors to rigorously evaluate privacy robustness without requiring annotated identity labels. Experiments on three video datasets demonstrate that our method effectively protects privacy while maintaining FER accuracy comparable to identity-supervised baselines.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.