ReDiffuse: Rotation Equivariant Diffusion Model for Multi-focus Image Fusion
Multi-focus image fusion (MFIF) combines source images from different focal planes into a single all-in-focus image. This paper targets a critical flaw in diffusion-based MFIF: defocus blur warps geometric structures, producing artifacts. The authors propose ReDiffuse, which embeds B-Conv (Fourier-series-based rotation-equivariant filters) into a U-Net diffusion backbone. By enforcing that rotations induce predictable feature transformations, the method aims to preserve edge orientation and structural consistency while reducing model size through parameter sharing.
The paper delivers a theoretically grounded approach to geometric consistency in diffusion-based MFIF. The equivariance error analysis (Theorems 3.1–3.4) is rigorous, and empirical results show consistent improvements across four datasets. However, the strong claim of "zero equivariance error" for 4-fold rotations (Corollary 3.5) relies on smoothness assumptions that the authors later acknowledge fail under severe defocus (Section 4.6). Additionally, the comparison methodology—using official pretrained weights for baselines while training ReDiffuse on Real-MFF—introduces potential data bias that undermines fairness claims.
The theoretical framework provides concrete, verifiable bounds. Theorem 3.4 proves that equivariance error scales linearly with mesh size $\delta$, and Corollary 3.6 extends this to arbitrary angles with an $m^{-1}\delta$ term. Empirically, Figure 8 validates that ReDiffuse achieves near-zero equivariance errors versus baseline architectures. The efficiency gains are also robust: the paper documents a reduction from 26.91M to 7.55M parameters, achieved by sharing filter parameters across rotation groups (each Rot-E convolution uses $1/m$ of standard conv parameters).
The "zero equivariance error" guarantee (Corollary 3.5) applies only to discrete rotations $\theta_k = 2k\pi/m$ and assumes Lipschitz bounds ($G, H$) on latent continuous functions. Section 4.6 admits these assumptions break down under strong defocus spread, creating a contradiction between theory and practice that is not resolved. Furthermore, the experimental comparison is potentially confounded: ReDiffuse was trained on Real-MFF (600 pairs), while competing diffusion methods (FusionDiff, VDMUFuse, etc.) appear to have been evaluated with official pretrained weights rather than retrained on the same split, making metric improvements (0.28–6.64%) difficult to attribute solely to the architectural innovation.
The evidence supports the core claim that rotation equivariance benefits MFIF, but comparisons are incomplete. Table 1 shows ReDiffuse ranks first or second on 23 of 24 metric-dataset combinations, with improvements up to 6.64% on $Q_G$. The generalization study (Table 2) is compelling: retrofitting FusionDiff, VDMUFuse, and Mask-DiFuser with Rot-E modules yields consistent gains (e.g., +0.03 $Q_{abf}$ for FusionDiff), isolating the architectural contribution from training data effects. However, the ablation (Table 3) reveals that replacing B-Conv with F-Conv or G-Conv causes only minor degradation (e.g., $Q_{abf}$ drops from 0.71 to 0.70), suggesting the specific Fourier parameterization is not the primary driver of performance.
Reproducibility is moderate. The authors provide a GitHub repository and specify key hyperparameters: Adam optimizer, batch size 32, 10,000 epochs, initial learning rate $2\times 10^{-4}$ decayed by 0.99 every 1,000 epochs. They clarify that deterministic sampling was used (removing stochastic noise from Eq. 4). However, critical implementation details—including the exact noise schedule $\beta_t$, data augmentation protocols, and random seeds—are relegated to Appendix B (not provided in the main text). Without these, exact reproduction of the diffusion training dynamics is impossible.
Diffusion models have achieved impressive performance on multi-focus image fusion (MFIF). However, a key challenge in applying diffusion models to the ill-posed MFIF problem is that defocus blur can make common symmetric geometric structures (e.g., textures and edges) appear warped and deformed, often leading to unexpected artifacts in the fused images. Therefore, embedding rotation equivariance into diffusion networks is essential, as it enables the fusion results to faithfully preserve the original orientation and structural consistency of geometric patterns underlying the input images. Motivated by this, we propose ReDiffuse, a rotation-equivariant diffusion model for MFIF. Specifically, we carefully construct the basic diffusion architectures to achieve end-to-end rotation equivariance. We also provide a rigorous theoretical analysis to evaluate its intrinsic equivariance error, demonstrating the validity of embedding equivariance structures. ReDiffuse is comprehensively evaluated against various MFIF methods across four datasets (Lytro, MFFW, MFI-WHU, and Road-MF). Results demonstrate that ReDiffuse achieves competitive performance, with improvements of 0.28-6.64\% across six evaluation metrics. The code is available at https://github.com/MorvanLi/ReDiffuse.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.