HMS-VesselNet: Hierarchical Multi-Scale Attention Network with Topology-Preserving Loss for Retinal Vessel Segmentation
HMS-VesselNet addresses the challenge of segmenting thin peripheral retinal vessels in fundus images—a critical task for early diabetic retinopathy detection where standard overlap losses fail due to class imbalance and topological fragmentation. The paper proposes a four-scale hierarchical Attention U-Net architecture with learned fusion weights, combining Dice, binary cross-entropy, and centerline Dice ($\text{clDice}$) losses alongside hard example mining to boost sensitivity on sub-2-pixel vessels. Evaluated on 68 images from DRIVE, STARE, and CHASE_DB1 via 5-fold cross-validation and leave-one-dataset-out protocols, the model achieves $90.78\pm1.42\%$ Sensitivity, demonstrating that explicit topology preservation and targeted hard example oversampling can recover fine vascular structures missed by standard area-based losses.
The paper presents a technically sound solution to a clinically significant problem, demonstrating consistent Sensitivity gains through rigorous cross-validation and leave-one-dataset-out (LODO) testing. However, the evaluation relies on a modest corpus of only 68 images, and the comparison with published methods is confounded by protocol differences—specifically, pooled 5-fold cross-validation versus standard dataset-specific splits—which limits claims of state-of-the-art performance. The work is best characterized as a solid methodological contribution rather than a definitive benchmark advancement.
The hierarchical multi-scale design is empirically validated by converged fusion weights that consistently favor the 256$\times$256 branch ($0.521\pm0.043$), confirming that intermediate resolution optimally balances vessel topology context against full-resolution detail. The ablation study convincingly isolates hard example mining as the dominant driver of thin-vessel recall, and the LODO experiments demonstrate robust cross-dataset generalization with AUC remaining above 95% on unseen camera types.
The primary limitation is the restricted dataset scale: with only 68 total images, the 5-fold cross-validation provides necessary but underpowered estimates, and the ablation study relies on a single fold, preventing statistical confirmation of reported differences. Furthermore, the exclusion of the high-resolution HRF dataset—due to a $6.9\times$ downsampling ratio that would reduce thin vessels to sub-pixel representations—restricts validation to narrow acquisition conditions and raises questions about applicability to modern high-resolution clinical workflows where such aggressive downsampling may be unacceptable.
The evidence supports the central claim that topology-aware training improves thin-vessel detection: the hardest CHASE_DB1 cases (Image_13R, 14R, 10R, 04L) consistently underperform across both cross-validation and LODO runs, confirming that their difficulty is intrinsic to image quality rather than training-set composition. However, the authors explicitly acknowledge that comparisons in Table 7 are 'not directly comparable' due to differing evaluation protocols—most prior work uses dataset-specific splits while HMS-VesselNet employs pooled CV—undermining any claims of superior absolute performance.
Implementation details are comprehensive, specifying PyTorch 2.0, AdamW with cosine annealing ($T_0=40$, multiplier 2), exact augmentation parameters from the Albumentations library, and a fixed random seed (42), with code and model weights promised for public release. However, the 124-million-parameter architecture requires substantial compute resources (batch size 2 on 16GB Tesla T4), and the specific hardware constraints documented suggest reproduction may be challenging on consumer-grade GPUs without architectural modification or further downsampling.
Retinal vessel segmentation methods based on standard overlap losses tend to miss thin peripheral vessels because these structures occupy very few pixels and have low contrast against the background. We propose HMS-VesselNet, a hierarchical multi-scale network that processes fundus images across four parallel branches at different resolutions and combines their outputs using learned fusion weights. The training loss combines Dice, binary cross-entropy, and centerline Dice to jointly optimize area overlap and vessel continuity. Hard example mining is applied from epoch 20 onward to concentrate gradient updates on the most difficult training images. Tested on 68 images from DRIVE, STARE, and CHASE_DB1 using 5-fold cross-validation, the model achieves a mean Dice of 88.72 +/- 0.67%, Sensitivity of 90.78 +/- 1.42%, and AUC of 98.25 +/- 0.21%. In leave-one-dataset-out experiments, AUC remains above 95% on each unseen dataset. The largest improvement is in the recall of thin peripheral vessels, which are the structures most frequently missed by standard methods and most critical for early detection of diabetic retinopathy.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.