Show Me What You Don't Know: Efficient Sampling from Invariant Sets for Model Validation

cs.LG Armand Rousselot, Joran Wendebourg, Ullrich K\"othe · Mar 23, 2026
Local to this browser
What it does
Understanding what representations neural networks discard is crucial for trustworthy ML. This paper proposes methods to sample from invariant sets (fibers) of feature extractors: either by regularizing conditional generative models with a...
Why it matters
This paper proposes methods to sample from invariant sets (fibers) of feature extractors: either by regularizing conditional generative models with a fiber loss, or by guiding pretrained diffusion models via non-linear diffusion trajectory...
Main concern
The paper presents a valuable contribution to neural network interpretability by making invariance analysis practical through training-free guidance. The dual approach—conditional models with fiber loss regularization for high-consistency...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Understanding what representations neural networks discard is crucial for trustworthy ML. This paper proposes methods to sample from invariant sets (fibers) of feature extractors: either by regularizing conditional generative models with a fiber loss, or by guiding pretrained diffusion models via non-linear diffusion trajectory matching (NDTM). The training-free NDTM approach reduces setup time from days to minutes, enabling rapid analysis of model blind spots including medical safety concerns.

Critical review
Verdict
Bottom line

The paper presents a valuable contribution to neural network interpretability by making invariance analysis practical through training-free guidance. The dual approach—conditional models with fiber loss regularization for high-consistency sampling, and NDTM for rapid exploration—addresses real pain points in the field. However, the trade-off between fidelity and consistency remains unresolved, and the NDTM method's computational cost and hyperparameter sensitivity limit its scalability.

What holds up

The controlled Color MNIST benchmark is methodologically sound, providing known ground-truth fiber distributions to evaluate different architectures. The finding that "diffusion/flow matching models clearly coming out on top" for consistency while "NDTM can achieve much better fidelity" is well-supported by Figure 4's Pareto analysis. The practical utility of the training-free approach is compelling—Table B.2 shows NDTM becomes faster than conditional training only after ~9k samples—making it ideal for rapid model iteration. The medical application discovering that "Qwen-2B places patients with situs inversus (heart on the right side) in the same fiber as typical anatomy" demonstrates real-world safety relevance.

“diffusion/flow matching models clearly coming out on top”
paper · Section 4.1
“Qwen-2B places patients with situs inversus (heart on the right side) in the same fiber as typical anatomy”
paper · Abstract
Main concerns

The NDTM guidance method exhibits instability on high-dimensional data; the authors acknowledge that "the generation process can sometimes be unstable, leading to oversaturated and overtexturized images." This stems from the mismatch between Tweedie's coarse estimate at early timesteps and the fine-grained features of the target. The proposed fix—using re-approximated targets $\phi(\mathbb{E}[x_0|x'_t])$—introduces potential information leakage concerns, though the Color MNIST control experiment in Appendix C.3 attempts to address this. The trade-off between fidelity and consistency is fundamental: conditional models with fiber loss require expensive training, while NDTM needs $N=4-8$ gradient optimization steps per timestep, making sampling "1.8 min" per image versus 0.3 min for conditional models.

“the generation process can sometimes be unstable, leading to oversaturated and overtexturized images”
paper · Section 3.2
“Time per sample NDTM: 1.8 min”
paper · Appendix B.2, Table B.4
“\mathcal{L}_{\text{terminal}}(x_t,x'_t)=\|\phi(\mathbb{E}[x_0|x_t])-\phi(\mathbb{E}[x_0|x'_t])\|_2^2”
paper · Section 3.2, Equation 12
Evidence and comparison

The comparison to prior work by Bordes et al. (2021) and Rombach et al. (2020) is fair and acknowledges limitations: conditional diffusion models achieve better consistency while NDTM offers higher fidelity without training. The Texture vs. Shape analysis on the Cue Conflict dataset confirms findings by Geirhos et al. (2018) that "DINOv2 representations exhibit a bias towards encoding image content by its shape," while ResNet50 fiber samples suggest shape information is preserved but not used by the classifier head. The InceptionV3 FID analysis is clever—showing TensorFlow FID drops to 3.81 while PyTorch FID stays at 33.52 when sampling from Inception fibers—exposing how FID blind spots can mask distribution shifts.

“DINOv2 representations exhibit a bias towards encoding image content by its shape”
paper · Section 4.3
“InceptionV3 NDTM: TF FID 33.52, PT FID 3.81”
paper · Table 2
“we are able to clearly show visually that SSL (backbone) representation are not invariant to the data augmentations they were trained with”
Bordes et al., 2021 · arXiv:2112.09164 abstract
Reproducibility

The authors provide substantial implementation details in Appendices B and C, including hyperparameters for Color MNIST VAE training (7.9M parameters, 700 epochs, one-cycle LR) and NDTM guidance strengths ($\gamma_t$ varying with timestep). However, critical hyperparameters like the fiber loss weight $\lambda_{\text{fiber}}$ and NDTM correction steps ($N=4-8$) require tuning per dataset. The CheXpert unconditional diffusion model (18M parameters, transformer UNet) is described but the pretrained weights are not clearly indicated as publicly available. Code availability is mentioned in the NDTM citation (Pandey et al., 2025) but not explicitly for this work's specific implementation.

“VAE is made up of 7.9 million parameters... epochs: 700”
paper · Appendix B.1.2
“$\kappa_t=10^{-4}$ and $\tau_t=0$ to be consistently stable choices... $N$ (usually 4 - 8) steps of gradient descent”
paper · Appendix C
Abstract

The performance of machine learning models is determined by the quality of their learned features. They should be invariant under irrelevant data variation but sensitive to task-relevant details. To visualize whether this is the case, we propose a method to analyze feature extractors by sampling from their fibers -- equivalence classes defined by their invariances -- given an arbitrary representative. Unlike existing work where a dedicated generative model is trained for each feature detector, our algorithm is training-free and exploits a pretrained diffusion or flow-matching model as a prior. The fiber loss -- which penalizes mismatch in features -- guides the denoising process toward the desired equivalence class, via non-linear diffusion trajectory matching. This replaces days of training for invariance learning with a single guided generation procedure at comparable fidelity. Experiments on popular datasets (ImageNet, CheXpert) and model types (ResNet, DINO, BiomedClip) demonstrate that our framework can reveal invariances ranging from very desirable to concerning behaviour. For instance, we show how Qwen-2B places patients with situs inversus (heart on the right side) in the same fiber as typical anatomy.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.