SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation for Medical Image Segmentation
SHAPE addresses unsupervised domain adaptation for medical image segmentation, where models trained on one imaging modality (e.g., MRI) degrade sharply when applied to another (e.g., CT). The core innovation shifts the paradigm from pixel-level correctness to global anatomical plausibility through a DINOv3 foundation model, a Hierarchical Feature Modulation (HFM) module for class-aware alignment, and a Hypergraph Plausibility Estimation (HPE) pipeline that validates pseudo-labels using higher-order anatomical relationships. This matters for deploying robust clinical segmentation models across diverse imaging environments without costly manual re-annotation.
The paper presents a technically sound framework that achieves state-of-the-art results on standard cardiac and abdominal benchmarks. The hierarchical feature modulation and hypergraph-based validation are well-motivated for medical imaging. However, the reliance on a frozen DINOv3 encoder—cited as an arXiv preprint (arXiv:2508.10104) that may not be publicly available—raises significant reproducibility concerns, and the ablation studies suggest the performance gains derive primarily from the strong foundation model backbone and HFM rather than the hypergraph components themselves.
The Hierarchical Feature Modulation (HFM) module demonstrates clear empirical value, with ablations showing a 3.65% Dice improvement over the DINOv3 baseline alone. The t-SNE visualizations provide compelling evidence that HFM preserves inter-class separability whereas global AdaIN induces 'severe distributional contraction' and 'homogenization of feature representations.' The motivation for anatomical plausibility checking is clinically sound, and the multi-level validation pipeline (HPE for global coherence, SAP for local artifacts) represents a coherent architectural choice for medical segmentation.
The primary concern is the dependency on DINOv3, which limits reproducibility if the foundation model weights are not publicly accessible. Methodologically, the ablation study reveals that Hypergraph Plausibility Estimation (HPE) alone provides only marginal gains (82.02% to 82.71% for MRI$\to$CT), suggesting the SOTA performance stems largely from HFM and the DINOv3 backbone rather than the hypergraph formulation itself. Additionally, the purity threshold $\tau_p=1$ (requiring perfectly pure semantic cores) seems arbitrarily strict and potentially brittle for noisy medical labels, and the paper lacks comparison against simpler graph-based alternatives to justify the hypergraph complexity.
The quantitative results on MMWHS and MICCAI 2015/CHAOS benchmarks support the claim of outperforming prior methods like DDFP and IPLC+. However, the comparisons are confounded by the use of DINOv3, which provides substantially stronger generic priors than the CNN backbones used in competing methods. Without a fair comparison where baseline methods also utilize DINOv3 features, it remains unclear whether the gains originate from the backbone or the proposed modules. The anatomical plausibility argument is supported by Figure 3's feature visualizations, but the specific contribution of hypergraphs versus standard graphs is not empirically validated.
Reproducibility is threatened by the reliance on DINOv3 (arXiv:2508.10104), which may not be publicly available or stable at the time of publication. While the authors provide a GitHub link and detail hyperparameters ($\alpha=0.25$, $\theta_{\mathcal{A}}$ at 50th percentile, $\gamma_{\text{unsup}}=1$), the frozen foundation model checkpoint creates a critical dependency. The training protocol is documented (200 epochs, batch size 64, AdamW, cosine annealing, EMA momentum 0.9), but without access to the exact DINOv3 weights and the full data preprocessing pipeline—including how 3D volumes are handled (2D slices or full volumes)—independent reproduction may be impractical.
Unsupervised Domain Adaptation (UDA) is essential for deploying medical segmentation models across diverse clinical environments. Existing methods are fundamentally limited, suffering from semantically unaware feature alignment that results in poor distributional fidelity and from pseudo-label validation that disregards global anatomical constraints, thus failing to prevent the formation of globally implausible structures. To address these issues, we propose SHAPE (Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation), a framework that reframes adaptation towards global anatomical plausibility. Built on a DINOv3 foundation, its Hierarchical Feature Modulation (HFM) module first generates features with both high fidelity and class-awareness. This shifts the core challenge to robustly validating pseudo-labels. To augment conventional pixel-level validation, we introduce Hypergraph Plausibility Estimation (HPE), which leverages hypergraphs to assess the global anatomical plausibility that standard graphs cannot capture. This is complemented by Structural Anomaly Pruning (SAP) to purge remaining artifacts via cross-view stability. SHAPE significantly outperforms prior methods on cardiac and abdominal cross-modality benchmarks, achieving state-of-the-art average Dice scores of 90.08% (MRI->CT) and 78.51% (CT->MRI) on cardiac data, and 87.48% (MRI->CT) and 86.89% (CT->MRI) on abdominal data. The code is available at https://github.com/BioMedIA-repo/SHAPE.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.