HamVision: Hamiltonian Dynamics as Inductive Bias for Medical Image Analysis
HamVision proposes using damped harmonic oscillator dynamics as a structured inductive bias for medical image analysis. The core idea is that phase-space decomposition yields three representations—position $q$ (features), momentum $p$ (gradients), and energy $H = rac{1}{2}|z|^2$ (saliency)—that serve both segmentation and classification tasks without modifying the shared bottleneck. This physics-constrained approach aims to replace generic learned transformations with interpretable, dynamics-based feature extraction across diverse medical imaging modalities.
The paper presents a compelling physics-informed architecture that constrains feature evolution to Hamiltonian dynamics, producing interpretable phase-space decompositions. While empirical segmentation results are strong and the interpretability analysis is thorough, the work occasionally overstates the novelty of classification results and relies on post-hoc rationalizations for counterintuitive momentum patterns. The theoretical grounding is weakened by circular references to unverified concurrent work, though the core innovation of a shared oscillator bottleneck for both dense and sparse prediction tasks remains sound.
The segmentation results demonstrate consistent improvements over strong baselines including FreqConvMamba and TransUNet with 2–12× fewer parameters, particularly on the challenging ACDC multi-class cardiac MRI task where HamSeg achieves 92.40% Dice (+2.61% over FreqConvMamba). The diagnostic analysis provides rare empirical validation that intermediate representations carry predicted physical semantics: the energy gating mechanism achieves a dynamic range of [0.16, 0.78], confirming functional activity beyond saturation, and the multi-scale boundary detection emerges without explicit boundary supervision as claimed.
The interpretation of momentum as encoding interior > boundary > exterior gradients contradicts standard edge detection theory where boundaries exhibit maximal gradient magnitude; the paper's post-hoc rationalization involving 'sustained coherent forcing' suggests the representation captures something different than traditional spatial derivatives, undermining the physical interpretation claimed in the abstract. Furthermore, the classification results are selectively reported as 'state-of-the-art' when only BloodMNIST and PathMNIST achieve best performance; on OrganSMNIST, HamCls trails MedMamba-B by 2.44%, and on DermaMNIST trails MedViT-S in accuracy (77.96% vs 78.0%). The reliance on Mamba-3 [15] to validate 'algebraic necessity' of complex-valued transitions is problematic given that [15] is an unverified arXiv preprint cited from the year 2026, leaving a circular dependency in the theoretical justification. Finally, the claim of 'parameter-free saliency map' is misleading because while $H = \frac{1}{2}|z|^2$ lacks parameters, its extraction and application require learned SE attention weights and projection matrices (Eq. 18).
The segmentation evidence supports the claim that Hamiltonian dynamics provide useful inductive bias across diverse modalities (dermoscopy, ultrasound, MRI), with qualitative results showing appropriate boundary localization. Comparisons to related work are generally fair, though the paper understates that VM-UNet and FreqConvMamba also use state-space principles, differing primarily in structural constraints rather than fundamental paradigm. The classification evidence is weaker: the phase-space pooling (Eq. 22) concatenates 784 features without ablation studies showing that momentum and energy contribute beyond standard feature pooling, and the claimed 'discriminative power' of these statistics lacks direct comparative evidence against ablated variants without phase-space components.
The authors provide comprehensive architectural details, code availability at the specified GitHub repository, and hyperparameters. However, critical implementation details are absent: the initialization strategy for natural frequencies $\omega_c$ and damping coefficients (whether fixed, learned per-channel, or initialized from physical priors) is unspecified. The gate initialization bias of +2.0 to favor ConvNeXt early in training suggests training instability without this 'warm-start' crutch, raising questions about whether the oscillator can train from scratch without ConvNeXt assistance. Additionally, the four-directional scanning strategy (left-to-right, right-to-left, top-to-bottom, bottom-to-top) may introduce directional bias not present in true 2D convolutions, and the lack of reported standard deviations or statistical significance tests across multiple runs limits confidence in the exact performance metrics.
We present HamVision, a framework for medical image analysis that uses the damped harmonic oscillator, a fundamental building block of signal processing, as a structured inductive bias for both segmentation and classification tasks. The oscillator's phase-space decomposition yields three functionally distinct representations: position~$q$ (feature content), momentum~$p$ (spatial gradients that encode boundary and texture information), and energy $H = \tfrac{1}{2}|z|^2$ (a parameter-free saliency map). These representations emerge from the dynamics, not from supervision, and can be exploited by different task-specific heads without any modification to the oscillator itself. For segmentation, energy gates the skip connections while momentum injects boundary information at every decoder level (HamSeg). For classification, the three representations are globally pooled and concatenated into a phase-space feature vector (HamCls). We evaluate HamVision across ten medical imaging benchmarks spanning five imaging modalities. On segmentation, HamSeg achieves state-of-the-art Dice scores on ISIC\,2018 (89.38\%), ISIC\,2017 (88.40\%), TN3K (87.05\%), and ACDC (92.40\%), outperforming most baselines with only 8.57M parameters. On classification, HamCls achieves state-of-the-art accuracy on BloodMNIST (98.85\%) and PathMNIST (96.65\%), and competitive results on the remaining MedMNIST datasets against MedMamba and MedViT. Diagnostic analysis confirms that the oscillator's momentum consistently encodes an interior$\,{>}\,$boundary$\,{>}\,$exterior gradient for segmentation and that the energy map correlates with discriminative regions for classification, properties that emerge entirely from the Hamiltonian dynamics. Code is available at https://github.com/Minds-R-Lab/hamvision.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.