ReDiffuse: Rotation Equivariant Diffusion Model for Multi-focus Image Fusion

cs.CV Bo Li, Tingting Bao, Lingling Zhang, Weiping Fu, Yaxian Wang, Jun Liu · Mar 22, 2026
Local to this browser
What it does
Multi-focus image fusion (MFIF) combines source images from different focal planes into a single all-in-focus image. This paper targets a critical flaw in diffusion-based MFIF: defocus blur warps geometric structures, producing artifacts.
Why it matters
The authors propose ReDiffuse, which embeds B-Conv (Fourier-series-based rotation-equivariant filters) into a U-Net diffusion backbone. By enforcing that rotations induce predictable feature transformations, the method aims to preserve...
Main concern
The paper delivers a theoretically grounded approach to geometric consistency in diffusion-based MFIF. The equivariance error analysis (Theorems 3.
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Multi-focus image fusion (MFIF) combines source images from different focal planes into a single all-in-focus image. This paper targets a critical flaw in diffusion-based MFIF: defocus blur warps geometric structures, producing artifacts. The authors propose ReDiffuse, which embeds B-Conv (Fourier-series-based rotation-equivariant filters) into a U-Net diffusion backbone. By enforcing that rotations induce predictable feature transformations, the method aims to preserve edge orientation and structural consistency while reducing model size through parameter sharing.

Critical review
Verdict
Bottom line

The paper delivers a theoretically grounded approach to geometric consistency in diffusion-based MFIF. The equivariance error analysis (Theorems 3.1–3.4) is rigorous, and empirical results show consistent improvements across four datasets. However, the strong claim of "zero equivariance error" for 4-fold rotations (Corollary 3.5) relies on smoothness assumptions that the authors later acknowledge fail under severe defocus (Section 4.6). Additionally, the comparison methodology—using official pretrained weights for baselines while training ReDiffuse on Real-MFF—introduces potential data bias that undermines fairness claims.

“Under the same conditions as in Theorem 3.4 and $m=4$, the following result is satisfied: $\mathrm{ReDiffuse}(\pi^{I}_{R_{\theta_{k}}}(I))=\pi^{I}_{R_{\theta_{k}}}[\mathrm{ReDiffuse}](I)$”
ReDiffuse (paper) · Corollary 3.5
“Under stronger defocus spread, local structures become unstable and the derivative estimates deviate further from the true geometric patterns. This increases the equivariance error and may lead to incorrect fusion results.”
ReDiffuse (paper) · Section 4.6
“To ensure fairness, all competing methods used the official provided code and weights”
ReDiffuse (paper) · Section 4.1
What holds up

The theoretical framework provides concrete, verifiable bounds. Theorem 3.4 proves that equivariance error scales linearly with mesh size $\delta$, and Corollary 3.6 extends this to arbitrary angles with an $m^{-1}\delta$ term. Empirically, Figure 8 validates that ReDiffuse achieves near-zero equivariance errors versus baseline architectures. The efficiency gains are also robust: the paper documents a reduction from 26.91M to 7.55M parameters, achieved by sharing filter parameters across rotation groups (each Rot-E convolution uses $1/m$ of standard conv parameters).

“Due to efficient parameter sharing, each Rot-E convolution uses only $\tfrac{1}{m}$ of the parameters of a regular convolution, where $m$ denotes the equivariant group and is set to $4$. As a result, ReDiffuse is lightweight, reducing the number of parameters from 26.91M to 7.55M.”
ReDiffuse (paper) · Figure 4
“$\|\mathrm{ReDiffuse}(\pi^{I}_{R_{\theta_{k}}}(I))-\pi^{I}_{R_{\theta_{k}}}[\mathrm{ReDiffuse}](I)\| \leq C_{1}\delta$”
ReDiffuse (paper) · Theorem 3.4
Main concerns

The "zero equivariance error" guarantee (Corollary 3.5) applies only to discrete rotations $\theta_k = 2k\pi/m$ and assumes Lipschitz bounds ($G, H$) on latent continuous functions. Section 4.6 admits these assumptions break down under strong defocus spread, creating a contradiction between theory and practice that is not resolved. Furthermore, the experimental comparison is potentially confounded: ReDiffuse was trained on Real-MFF (600 pairs), while competing diffusion methods (FusionDiff, VDMUFuse, etc.) appear to have been evaluated with official pretrained weights rather than retrained on the same split, making metric improvements (0.28–6.64%) difficult to attribute solely to the architectural innovation.

“We train and validate the proposed ReDiffuse on the real-world MFIF dataset Real-MFF (Zhang et al., 2020b), which contains 600 image pairs for training and 110 image pairs for validation”
ReDiffuse (paper) · Section 4.1
“This is a typical case of failure to maintain equivariance... the equivariance error is jointly determined by the local derivatives $G_0$ and $H_0$ and the grid size $\delta$”
ReDiffuse (paper) · Section 4.6
Evidence and comparison

The evidence supports the core claim that rotation equivariance benefits MFIF, but comparisons are incomplete. Table 1 shows ReDiffuse ranks first or second on 23 of 24 metric-dataset combinations, with improvements up to 6.64% on $Q_G$. The generalization study (Table 2) is compelling: retrofitting FusionDiff, VDMUFuse, and Mask-DiFuser with Rot-E modules yields consistent gains (e.g., +0.03 $Q_{abf}$ for FusionDiff), isolating the architectural contribution from training data effects. However, the ablation (Table 3) reveals that replacing B-Conv with F-Conv or G-Conv causes only minor degradation (e.g., $Q_{abf}$ drops from 0.71 to 0.70), suggesting the specific Fourier parameterization is not the primary driver of performance.

“ReDiffuse outperforms the second-best method by margins of 2.63%, 3.89%, 6.64%, 4.01%, 1.21%, and 0.28% on $Q_{abf}$, QMI, $Q_G$, QP, QE, and MS-SSIM, respectively”
ReDiffuse (paper) · Table 1
“Consistent performance gains are achieved when applying Rot-E to FusionDiff, VDMUFuse, and Mask-DiFuser”
ReDiffuse (paper) · Table 2
“F-Conv $\to$ B-Conv: $Q_{abf}$ 0.70 vs 0.71”
ReDiffuse (paper) · Table 3
Reproducibility

Reproducibility is moderate. The authors provide a GitHub repository and specify key hyperparameters: Adam optimizer, batch size 32, 10,000 epochs, initial learning rate $2\times 10^{-4}$ decayed by 0.99 every 1,000 epochs. They clarify that deterministic sampling was used (removing stochastic noise from Eq. 4). However, critical implementation details—including the exact noise schedule $\beta_t$, data augmentation protocols, and random seeds—are relegated to Appendix B (not provided in the main text). Without these, exact reproduction of the diffusion training dynamics is impossible.

“Our proposed ReDiffuse is trained for 10,000 epochs with a batch size of 32 using the Adam optimizer. The initial learning rate is set to 0.0002 and decayed by a factor of 0.99 every 1,000 epochs”
ReDiffuse (paper) · Section 4.1
“Given the deterministic nature of MFIF, we remove the stochastic noise term and perform sampling using only the mean of Eq. (4)”
ReDiffuse (paper) · Section 3.2
Abstract

Diffusion models have achieved impressive performance on multi-focus image fusion (MFIF). However, a key challenge in applying diffusion models to the ill-posed MFIF problem is that defocus blur can make common symmetric geometric structures (e.g., textures and edges) appear warped and deformed, often leading to unexpected artifacts in the fused images. Therefore, embedding rotation equivariance into diffusion networks is essential, as it enables the fusion results to faithfully preserve the original orientation and structural consistency of geometric patterns underlying the input images. Motivated by this, we propose ReDiffuse, a rotation-equivariant diffusion model for MFIF. Specifically, we carefully construct the basic diffusion architectures to achieve end-to-end rotation equivariance. We also provide a rigorous theoretical analysis to evaluate its intrinsic equivariance error, demonstrating the validity of embedding equivariance structures. ReDiffuse is comprehensively evaluated against various MFIF methods across four datasets (Lytro, MFFW, MFI-WHU, and Road-MF). Results demonstrate that ReDiffuse achieves competitive performance, with improvements of 0.28-6.64\% across six evaluation metrics. The code is available at https://github.com/MorvanLi/ReDiffuse.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.