Anatomical Token Uncertainty for Transformer-Guided Active MRI Acquisition

cs.CV Lev Ayzenberg, Shady Abu-Hussein, Raja Giryes, Hayit Greenspan · Mar 23, 2026
Local to this browser
What it does
MRI acquisition is inherently slow due to sequential k-space sampling. This paper proposes TRUST-MRI, an active sampling framework that leverages discrete anatomical tokens from the pretrained MedITok tokenizer and a latent Transformer to...
Why it matters
97 fps compared to 0. 01 fps for diffusion-based active methods.
Main concern
The paper presents a novel active MRI sampling framework that exploits quantized visual tokens to define a principled uncertainty measure via entropy. While LES and GEO achieve state-of-the-art perceptual metrics (LPIPS, DISTS, SSFD) on...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

MRI acquisition is inherently slow due to sequential k-space sampling. This paper proposes TRUST-MRI, an active sampling framework that leverages discrete anatomical tokens from the pretrained MedITok tokenizer and a latent Transformer to guide measurement selection. The core innovation uses token prediction entropy as an uncertainty signal, introducing two policies: Latent Entropy Selection (LES) projects patch-wise entropy to k-space to select lines, while Gradient-based Entropy Optimization (GEO) uses gradients of total entropy with respect to input measurements. The approach trades pixel-wise fidelity for perceptual quality and computational efficiency, achieving superior feature-based metrics while running at 0.97 fps compared to 0.01 fps for diffusion-based active methods.

Critical review
Verdict
Bottom line

The paper presents a novel active MRI sampling framework that exploits quantized visual tokens to define a principled uncertainty measure via entropy. While LES and GEO achieve state-of-the-art perceptual metrics (LPIPS, DISTS, SSFD) on fastMRI Knee and Brain at ×8 and ×16 acceleration, they suffer significant degradation in traditional metrics, with PSNR drops of up to 3 dB compared to distortion-optimized baselines. As stated in Section 4, 'The PSNR gap, which reaches up to 3 dB in certain settings, is a recognized limitation of our approach.' The work is well-motivated for scenarios prioritizing anatomical structure over pixel accuracy, though clinical acceptance of this trade-off remains to be validated.

“The PSNR gap, which reaches up to 3 dB in certain settings, is a recognized limitation of our approach.”
paper · Section 4
What holds up

The discrete token formulation provides a well-defined probability distribution enabling principled uncertainty quantification via Shannon entropy $h_l = -\sum_{k=1}^{K}p(z_k|\mathbf{H}_0)\log p(z_k|\mathbf{H}_0)$. The evaluation is comprehensive, covering both pixel-wise and perceptual metrics, with ablations in Table 2 demonstrating that iterative active sampling ($T=22$) consistently improves over single-step ($T=1$) and random baselines. The efficiency analysis validates practical clinical viability: as noted in Section 4, LES achieves throughput of 0.97 fps, offering 'nearly $2\times$ higher throughput' than GEO and orders of magnitude faster than diffusion-based AdaSense (0.01 fps).

“T denotes sampling steps”
paper · Table 2
“nearly $2\times$ higher throughput”
paper · Section 4
Main concerns

The most critical limitation is the substantial PSNR penalty—on fastMRI Knee at ×8, PUERT achieves 33.63 dB versus LES/GEO at approximately 30.2 dB (Table 1), while Brain ×8 shows PUERT at 30.73 dB versus GEO at 27.26 dB. The paper attributes this partly to suppression of acquisition noise, but the VQ-VAE Oracle results (32.14 dB) suggest the Transformer's token prediction capacity is the primary bottleneck. Furthermore, the approach is restricted to 1D Cartesian masks selecting vertical phase-encoding lines, limiting flexibility for 2D or non-Cartesian trajectories. Brain experiments rely on emulated single-coil (ESC) data rather than true single-coil acquisitions, and AdaSense was omitted for Brain experiments due to computational constraints, weakening cross-dataset comparisons.

Evidence and comparison

The comparative evaluation attempts fairness by retraining LOUPE, PUERT, and Ada-Sel under a unified protocol. However, comparisons are confounded by incompatible reconstruction architectures—baselines use U-Net, ISTA-Unfold, and VarNet respectively, while this work uses a Transformer-decoder—making it impossible to isolate the sampling policy from backbone effects. The Oracle experiment provides valuable diagnostic insight: 'comparisons with the VQ-VAE Oracle indicate that the discrete latent representation itself is not the primary performance bottleneck, but the Transformer's current predictive ability to fully recover the latent sequence.' This suggests the PSNR gap stems from token prediction limitations rather than the quantization itself.

“comparisons with the VQ-VAE Oracle indicate that the discrete latent representation itself is not the primary performance bottleneck, but the Transformer's current predictive ability to fully recover the latent sequence.”
paper · Section 3.2
Reproducibility

Reproducibility is generally strong: code is available on GitHub, the standard fastMRI dataset is used with specified splits (34K/7K for Knee), and implementation details are thorough ($N=24$ layers, $H=16$ heads, $d=1024$, patch $p=16$, codebook $|\mathcal{Z}|=32768$, AdamW optimizer with $1\times 10^{-4}$ learning rate). However, reproduction depends on the frozen MedITok tokenizer, which requires specific pretrained weights, and the exact preprocessing protocol for emulated single-coil brain data following Tygert et al. may affect comparability with native single-coil studies.

Abstract

Full data acquisition in MRI is inherently slow, which limits clinical throughput and increases patient discomfort. Compressed Sensing MRI (CS-MRI) seeks to accelerate acquisition by reconstructing images from under-sampled k-space data, requiring both an optimal sampling trajectory and a high-fidelity reconstruction model. In this work, we propose a novel active sampling framework that leverages the inherent discrete structure of a pretrained medical image tokenizer and a latent transformer. By representing anatomy through a dictionary of quantized visual tokens, the model provides a well-defined probability distribution over the latent space. We utilize this distribution to derive a principled uncertainty measure via token entropy, which guides the active sampling process. We introduce two strategies to exploit this latent uncertainty: (1) Latent Entropy Selection (LES), projecting patch-wise token entropy into the $k$-space domain to identify informative sampling lines, and (2) Gradient-based Entropy Optimization (GEO), which identifies regions of maximum uncertainty reduction via the $k$-space gradient of a total latent entropy loss. We evaluate our framework on the fastMRI singlecoil Knee and Brain datasets at $\times 8$ and $\times 16$ acceleration. Our results demonstrate that our active policies outperform state-of-the-art baselines in perceptual metrics, and feature-based distances. Our code is available at https://github.com/levayz/TRUST-MRI.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.