Uncertainty Quantification for Distribution-to-Distribution Flow Matching in Scientific Imaging
This paper addresses uncertainty quantification (UQ) for distribution-to-distribution flow matching, a setting where models map between well-defined source and target distributions (e.g., unperturbed to drug-treated cell images) rather than noise-to-data. The authors propose Bayesian Stochastic Flow Matching (BSFM), which combines Stochastic Flow Matching (SFM) for capturing aleatoric uncertainty via learnable diffusion terms, with MCD-Antithetic—a scalable Bayesian method using Monte Carlo Dropout and antithetic sampling—to decompose total uncertainty into aleatoric and epistemic components for reliable out-of-distribution (OOD) detection in scientific imaging.
The paper presents a principled and well-executed study on UQ for distribution-to-distribution generation, a relatively underexplored area. The marginal-preserving SDE derivation for SFM is theoretically sound, and the extensive multi-dataset evaluation (BBBC021, JUMP, fMRI) demonstrates consistent improvements in both generation quality under distribution shifts and OOD detection. However, the experimental design raises concerns: the OOD ground truth is curated via filtering that removes 'misleadingly in-distribution' samples based on prediction error thresholds, which may inflate reported AUROC scores. Additionally, the necessity of flipping the sign on epistemic uncertainty scores (using $-tr(\hat{E})$ due to 'model collapse') suggests fundamental challenges with the Bayesian approximation that are not fully resolved.
The marginal-preserving stochastic flow derivation using the Fokker-Planck equation represents solid theoretical grounding. As stated in Section 4.1, the SDE $d{\bm{x}}_t = ({\bm{v}}_\theta({\bm{x}}_t,t,c) - \frac{1}{2}\sigma_t^2 s_\phi({\bm{x}}_t,t,c))dt + \sigma_t d{\bm{W}}_t$ 'shares identical marginals' while introducing controlled stochasticity. The empirical results in Table 1 show consistent FID improvements across all five distribution shift scenarios, with particularly strong gains on severe shifts (e.g., BBBC021 Unseen Perturbations: 103.73 → 33.29). The MCD-Antithetic method achieves state-of-the-art OOD detection performance in Table 2, with AUROC reaching 0.8071 for unseen perturbations versus 0.6849 for SWAG.
First, the OOD detection evaluation protocol is problematic. The authors filter OOD samples using 'prediction error measured in the feature space' or SSIM, keeping only 'high-error cases' as OOD (Appendix C). This artificially selects for 'easy' OOD samples and removes ambiguous cases, potentially inflating AUROC metrics. Second, the epistemic uncertainty signal requires an ad-hoc sign flip: 'Our choice of -tr instead of tr is motivated by... model collapse leading to overconfidence OOD'. This counter-intuitive behavior—where epistemic uncertainty decreases under distribution shifts—indicates the MC-Dropout approximation may not reliably capture model uncertainty. Third, Proposition 4.1 assumes Lipschitz continuity and Gaussian posterior concentration for the MAP approximation, assumptions unlikely to hold for deep U-Nets in practice. Finally, the computational cost remains substantial: even 'sample-efficient' MCD-Antithetic requires 32 forward passes (8 dropout × 4 SDE samples) per image.
The evidence supports the claim that SFM improves generalization, with consistent FID/KID gains across all scenarios in Table 1. The comparison to baselines (CellFlux, UNSB, SDEdit) is fair and comprehensive. However, the OOD detection benchmarking relies on self-selected splits that favor high prediction error, raising questions about generalization to unfiltered OOD data. The comparison between SWAG, Laplace Approximation, and MCD-Antithetic is valuable, though the antithetic sampling benefits are presented without variance estimates or statistical significance tests. The observation that 'epistemic uncertainty performs better on scenarios with severe distribution shifts, whereas aleatoric uncertainty performs better on slight distribution shifts' is well-supported by Table 2 results and provides useful practical guidance, though the underlying mechanism (collapse of conditional variability vs. model agreement) warrants deeper theoretical analysis.
The paper provides architectural details (U-Net with IMPA conditioning) and training hyperparameters (100-200 epochs, 2 NVIDIA H100 GPUs), but lacks a public code repository reference or explicit statement of code availability. The datasets (BBBC021, JUMP, ds000228 fMRI) are public, enabling reproduction. However, critical implementation details—such as the specific dropout rates for MC-Dropout, the noise schedule $\sigma_t$ for the SDE, and the exact filtering thresholds ($\mu - 0.5\sigma$, $\mu + 0.5\sigma$) for OOD sample selection—are either unspecified or buried in appendices. The antithetic sampling implementation requires careful handling of Brownian motion negation, which is described but not pseudocoded. Reproduction would be feasible for an expert practitioner but challenging without released code.
Distribution-to-distribution generative models support scientific imaging tasks ranging from modeling cellular perturbation responses to translating medical images across conditions. Trustworthy generation requires both reliability (generalization across labs, devices, and experimental conditions) and accountability (detecting out-of-distribution cases where predictions may be unreliable). Uncertainty quantification (UQ) based approaches serve as promising candidates for these tasks, yet UQ for distribution-to-distribution generative models remains underexplored. We present a unified UQ framework, Bayesian Stochastic Flow Matching (BSFM), that disentangles aleatoric and epistemic uncertainty. The Stochastic Flow Matching (SFM) component augments deterministic flows with a diffusion term to improve model generalization to unseen scenarios. For UQ, we develop a scalable Bayesian approach -- MCD-Antithetic -- that combines Monte Carlo Dropout with sample-efficient antithetic sampling to produce effective anomaly scores for out-of-distribution detection. Experiments on cellular imaging (BBBC021, JUMP) and brain fMRI (Theory of Mind) across diverse scenarios show that SFM improves reliability while MCD-Antithetic enhances accountability.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.