Knowledge Priors for Identity-Disentangled Open-Set Privacy-Preserving Video FER

cs.CV Feng Xu, Xun Li, Lars Petersson, Yulei Sui, David Ahmedt Aristizabal, Dadong Wang · Mar 22, 2026
Local to this browser
What it does
This paper addresses privacy-preserving facial expression recognition (FER) in video without requiring identity labels—a critical gap since real-world deployment often lacks identity annotations. The core idea leverages intra- and...
Why it matters
The core idea leverages intra- and inter-video knowledge priors to train an identity suppression network followed by a denoising module, enabling open-set privacy preservation. This matters because current methods either require closed-set...
Main concern
The paper presents a well-motivated approach to a practical problem, offering a novel framework that decouples privacy and utility through a two-stage training process. The introduction of a falsification-based validation protocol without...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

This paper addresses privacy-preserving facial expression recognition (FER) in video without requiring identity labels—a critical gap since real-world deployment often lacks identity annotations. The core idea leverages intra- and inter-video knowledge priors to train an identity suppression network followed by a denoising module, enabling open-set privacy preservation. This matters because current methods either require closed-set identity supervision or suffer from entangled privacy-utility trade-offs that degrade performance.

Critical review
Verdict
Bottom line

The paper presents a well-motivated approach to a practical problem, offering a novel framework that decouples privacy and utility through a two-stage training process. The introduction of a falsification-based validation protocol without identity labels addresses a significant evaluation gap in open-set scenarios. However, while the method shows promise on in-the-wild data, the theoretical foundations of why the denoising module improves privacy remain underdeveloped, and some experimental comparisons conflate training on source data with zero-shot transfer to target datasets.

“Finally, although the denoising module improves both FER and privacy preservation, its theoretical foundations require further study.”
Section V · Conclusion
What holds up

The knowledge prior extraction strategy using temporal tracking for pseudo-labels is technically sound and robust to identity switches since triplets are sampled from different videos. The falsification-based validation protocol (Rules 4-7 in Table II) is a creative solution for evaluating privacy without ground-truth identity labels, achieving $P_{pre}$ of 0.9620 on DFEW compared to 0.6843 for adversarial baselines. The ablation study demonstrating that removing expression labels from batch construction degrades performance validates the importance of expression-aware sampling.

“while high similarity thresholds can effectively avoid identity merging, they may introduce identity switches. However, since each triplet in a batch is sampled from a different video, such switches do not affect dataset construction.”
Section III-B · Paragraph 3
“our method... achieves $P_{pre}$ of 0.9620 [on DFEW]”
Section IV-C · Table V
Main concerns

The privacy preservation ratio $P_{pre}$ calculation excludes cases where different identities might be incorrectly matched (focusing only on falsified cases), potentially overstating privacy guarantees. The paper explicitly acknowledges that the denoising module's theoretical foundations "require further study," yet still claims it improves privacy preservation without mechanistic justification. Additionally, the comparison with closed-set baselines on CREMA-D and RAVDESS is confounded by the fact that closed-set methods retrain on these target datasets while the proposed method transfers from DFEW, yet performance gaps are attributed solely to the closed vs. open-set distinction.

“Pairs from different unique IDs are excluded because a prediction of 1 unambiguously indicates privacy leakage, whereas a prediction of 0 is inconclusive, as it may result from successful anonymization or from naturally dissimilar original identities.”
Section III-E · Paragraph 3
“Finally, although the denoising module improves both FER and privacy preservation, its theoretical foundations require further study.”
Section V · Conclusion
Evidence and comparison

The evidence supports the claim that the method achieves comparable FER accuracy to baselines while providing stronger privacy protection in open-set settings, with FER accuracy of 36.54% on DFEW versus 32.44% for adversarial approaches (Table IV). However, the SSIM/PSNR metrics in Table VI show the method falls short of closed-set approaches (Contr-HL), contradicting the claim of surpassing "all open-set approaches" while showing gaps to closed-set methods. The falsification-based validation is novel but lacks comparison against standard re-identification attacks that might reveal vulnerabilities not captured by the binary matching classifier $f_{match}$.

“Ours [achieves] 36.54 [on DFEW with R(2+1)D] vs Adver. 32.44”
Section IV-C · Table IV
“Our method surpasses all open-set approaches on these metrics but shows a small gap relative to closed-set methods.”
Section IV-C · Paragraph 4
Reproducibility

The paper provides comprehensive implementation details including specific hyperparameters ($\alpha=0.01$, 400 epochs, batch size 1024), pre-trained backbones (RetinaFace, ArcFace), and dataset splits (Section IV-A, VII-B). However, no code repository or data release is mentioned, which would be critical for reproducing the complex multi-stage pipeline involving tracking, privacy preservation, denoising, and falsification-based validation. The reliance on specific random seeds (fixed at 42) is mentioned, but the exact data preprocessing flow and triplet sampling strategy require careful reconstruction from Algorithm 1 and Section III-B.

“Privacy preservation employs U-Net as $f_{pp}$ and ArcFace as $f_e$... with $\alpha=0.01$ over 400 epochs... The random seed is fixed at 42.”
Section IV-A · Implementation paragraph
“The total loss $l := \alpha \times l_{tri} + (1-\alpha) \times l_{bce}$”
Algorithm 1 · Line 11
Abstract

Facial expression recognition relies on facial data that inherently expose identity and thus raise significant privacy concerns. Current privacy-preserving methods typically fail in realistic open-set video settings where identities are unknown, and identity labels are unavailable. We propose a two-stage framework for video-based privacy-preserving FER in challenging open-set settings that requires no identity labels at any stage. To decouple privacy and utility, we first train an identity-suppression network using intra- and inter-video knowledge priors derived from real-world videos without identity labels. This network anonymizes identity while preserving expressive cues. A subsequent denoising module restores expression-related information and helps recover FER performance. Furthermore, we introduce a falsification-based validation method that uses recognition priors to rigorously evaluate privacy robustness without requiring annotated identity labels. Experiments on three video datasets demonstrate that our method effectively protects privacy while maintaining FER accuracy comparable to identity-supervised baselines.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.