SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation for Medical Image Segmentation

cs.CV cs.AI Linkuan Zhou, Yinghao Xia, Yufei Shen, Xiangyu Li, Wenjie Du, Cong Cong, Leyi Wei, Ran Su, Qiangguo Jin · Mar 23, 2026

What it does

SHAPE addresses unsupervised domain adaptation for medical image segmentation, where models trained on one imaging modality (e. g.

Why it matters

The core innovation shifts the paradigm from pixel-level correctness to global anatomical plausibility through a DINOv3 foundation model, a Hierarchical Feature Modulation (HFM) module for class-aware alignment, and a Hypergraph...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

SHAPE addresses unsupervised domain adaptation for medical image segmentation, where models trained on one imaging modality (e.g., MRI) degrade sharply when applied to another (e.g., CT). The core innovation shifts the paradigm from pixel-level correctness to global anatomical plausibility through a DINOv3 foundation model, a Hierarchical Feature Modulation (HFM) module for class-aware alignment, and a Hypergraph Plausibility Estimation (HPE) pipeline that validates pseudo-labels using higher-order anatomical relationships. This matters for deploying robust clinical segmentation models across diverse imaging environments without costly manual re-annotation.

Critical review

Verdict

Bottom line

The paper presents a technically sound framework that achieves state-of-the-art results on standard cardiac and abdominal benchmarks. The hierarchical feature modulation and hypergraph-based validation are well-motivated for medical imaging. However, the reliance on a frozen DINOv3 encoder—cited as an arXiv preprint (arXiv:2508.10104) that may not be publicly available—raises significant reproducibility concerns, and the ablation studies suggest the performance gains derive primarily from the strong foundation model backbone and HFM rather than the hypergraph components themselves.

“SHAPE significantly outperforms prior methods on cardiac and abdominal cross-modality benchmarks, achieving state-of-the-art average Dice scores of 90.08% (MRI$\to$CT) and 78.51% (CT$\to$MRI) on cardiac data”

paper · Abstract

“Our PyTorch framework uses a pre-trained frozen DINOv3 ViT-S/16 encoder”

paper · Section 4.1

What holds up

The Hierarchical Feature Modulation (HFM) module demonstrates clear empirical value, with ablations showing a 3.65% Dice improvement over the DINOv3 baseline alone. The t-SNE visualizations provide compelling evidence that HFM preserves inter-class separability whereas global AdaIN induces 'severe distributional contraction' and 'homogenization of feature representations.' The motivation for anatomical plausibility checking is clinically sound, and the multi-level validation pipeline (HPE for global coherence, SAP for local artifacts) represents a coherent architectural choice for medical segmentation.

“Integrating our HFM provides the most significant individual performance improvement, improving the DSC by 3.65% percentage points to 85.67%”

paper · Section 4.3

“This monolithic transformation induces a severe distributional contraction, whereby target features from all classes are indiscriminately aggregated and lose their inter-class separability”

paper · Section 4.2

Main concerns

The primary concern is the dependency on DINOv3, which limits reproducibility if the foundation model weights are not publicly accessible. Methodologically, the ablation study reveals that Hypergraph Plausibility Estimation (HPE) alone provides only marginal gains (82.02% to 82.71% for MRI$\to$CT), suggesting the SOTA performance stems largely from HFM and the DINOv3 backbone rather than the hypergraph formulation itself. Additionally, the purity threshold $\tau_p=1$ (requiring perfectly pure semantic cores) seems arbitrarily strict and potentially brittle for noisy medical labels, and the paper lacks comparison against simpler graph-based alternatives to justify the hypergraph complexity.

“Adding HPE alone also yields a consistent improvement”

paper · Section 4.3

“Key hyperparameters for our method are the HFM purity threshold $\tau_p=1$”

paper · Section 4.1

Evidence and comparison

The quantitative results on MMWHS and MICCAI 2015/CHAOS benchmarks support the claim of outperforming prior methods like DDFP and IPLC+. However, the comparisons are confounded by the use of DINOv3, which provides substantially stronger generic priors than the CNN backbones used in competing methods. Without a fair comparison where baseline methods also utilize DINOv3 features, it remains unclear whether the gains originate from the backbone or the proposed modules. The anatomical plausibility argument is supported by Figure 3's feature visualizations, but the specific contribution of hypergraphs versus standard graphs is not empirically validated.

“The baseline itself achieves a respectable DSC of 82.02% on the MRI $\to$ CT task”

paper · Section 4.3

“SHAPE achieves a DSC of 90.08%, marking a substantial 5.62% improvement over the second-best method (DDFP)”

paper · Section 4.2

Reproducibility

Reproducibility is threatened by the reliance on DINOv3 (arXiv:2508.10104), which may not be publicly available or stable at the time of publication. While the authors provide a GitHub link and detail hyperparameters ($\alpha=0.25$, $\theta_{\mathcal{A}}$ at 50th percentile, $\gamma_{\text{unsup}}=1$), the frozen foundation model checkpoint creates a critical dependency. The training protocol is documented (200 epochs, batch size 64, AdamW, cosine annealing, EMA momentum 0.9), but without access to the exact DINOv3 weights and the full data preprocessing pipeline—including how 3D volumes are handled (2D slices or full volumes)—independent reproduction may be impractical.

“The code is available at https://github.com/BioMedIA-repo/SHAPE”

paper · Abstract

“Key hyperparameters for our method are the HFM purity threshold $\tau_p=1$, a plausibility fusion weight $\alpha=0.25$, a selection percentile $\rho$ starting at 0.1 with a sigmoid ramp-up, an anomaly threshold $\theta_{\mathcal{A}}$ at the 50th percentile of instability scores”

paper · Section 4.1

Abstract

Unsupervised Domain Adaptation (UDA) is essential for deploying medical segmentation models across diverse clinical environments. Existing methods are fundamentally limited, suffering from semantically unaware feature alignment that results in poor distributional fidelity and from pseudo-label validation that disregards global anatomical constraints, thus failing to prevent the formation of globally implausible structures. To address these issues, we propose SHAPE (Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation), a framework that reframes adaptation towards global anatomical plausibility. Built on a DINOv3 foundation, its Hierarchical Feature Modulation (HFM) module first generates features with both high fidelity and class-awareness. This shifts the core challenge to robustly validating pseudo-labels. To augment conventional pixel-level validation, we introduce Hypergraph Plausibility Estimation (HPE), which leverages hypergraphs to assess the global anatomical plausibility that standard graphs cannot capture. This is complemented by Structural Anomaly Pruning (SAP) to purge remaining artifacts via cross-view stability. SHAPE significantly outperforms prior methods on cardiac and abdominal cross-modality benchmarks, achieving state-of-the-art average Dice scores of 90.08% (MRI->CT) and 78.51% (CT->MRI) on cardiac data, and 87.48% (MRI->CT) and 86.89% (CT->MRI) on abdominal data. The code is available at https://github.com/BioMedIA-repo/SHAPE.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.