Focus on Background: Exploring SAM's Potential in Few-shot Medical Image Segmentation with Background-centric Prompting

cs.CV Yuntian Bo, Yazhou Zhu, Piotr Koniusz, Haofeng Zhang · Mar 22, 2026

What it does

Why it matters

This paper reformulates SAM-based FSMIS as a background-centric prompt localization task, proposing FoB (Focus on Background) to generate precise background prompts that constrain SAM’s predictions. By modeling contextual dependencies and...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Few-shot medical image segmentation (FSMIS) aims to segment anatomical structures with minimal annotations, but Segment Anything Model (SAM) based approaches suffer from over-segmentation due to ambiguous medical boundaries. This paper reformulates SAM-based FSMIS as a background-centric prompt localization task, proposing FoB (Focus on Background) to generate precise background prompts that constrain SAM’s predictions. By modeling contextual dependencies and ring-like structural priors, the method achieves state-of-the-art performance across CT, MRI, and dermatoscopic imaging while maintaining strong cross-domain generalization.

Critical review

Verdict

Bottom line

The paper presents a compelling and well-executed solution to SAM’s over-segmentation in medical images by shifting the focus from foreground to background prompting. The plug-and-play design, which avoids costly SAM fine-tuning, is practically valuable, and the structured prompt refinement effectively encodes anatomical priors. However, the argument against medical-domain SAM variants is weakened by the authors’ own results showing superior performance of SAM-Med2D on Skin-DS, and the reliance on pseudo-label quality remains a potential vulnerability.

“FoB is trained independently of SAM and serves as a plug-and-play prompt generator during inference”

Figure 2 caption · Section 3.2

“AM-SAM jointly fine-tunes the SAM model and trains the prompt generator, substantially increasing computational cost”

paper · Section 4.1

What holds up

The technical motivation is rigorously justified: the authors empirically observe that "SAM frequently over-segments medical images" and that "precise background prompts can effectively constrain over-segmentation," whereas prior work like ProtoSAM only provides foreground prompts. The three-stage architecture (BPPC, BCM, SPR) is coherent, with the ring-structured graph constraint $(\mathbf{A}^{ring})_{ij}$ in SPR particularly well-motivated for capturing anatomical background structures. The cross-domain evaluation demonstrates genuine robustness, with FoB outperforming domain-specialized methods by generating transferable point-level prompts rather than brittle appearance-based features.

“SAM frequently over-segments medical images”

paper · Section 1

“precise background prompts can effectively constrain over-segmentation”

paper · Section 1

Main concerns

The paper claims medical SAM variants are unsuitable for FSMIS due to protocol violations, yet Table 1 shows FoB+S-2D (SAM-Med2D) significantly surpasses FoB+SAM on Skin-DS (84.80% vs 76.62%), contradicting the universal superiority claim. The authors acknowledge this discrepancy but attribute it to potential training-test data overlap without concrete evidence. Furthermore, the training pipeline relies heavily on pseudo-labels generated via "3D supervoxel clustering" or SLIC superpixels, making reproducibility sensitive to preprocessing hyperparameters that are not the paper’s core contribution. The fixed choice of $N_p = N_f = 10$ prompts also appears empirically tuned without theoretical justification for why this number optimally balances constraint and flexibility.

“FoB+S-2D significantly surpasses it on Skin-DS”

paper · Section 4.1

“The 3D supervoxel clustering method is utilized to generate pseudo-masks for supervision during training”

paper · Section 4

Evidence and comparison

The comparisons cover relevant baselines including ALPNet, RPT, GMRD, ProtoSAM, and the concurrent AM-SAM, with results showing FoB "outperforms other baselines by large margins" on abdominal datasets. The concurrent AM-SAM achieves comparable performance on Abd-CT (86.19% vs 86.21%) despite requiring full SAM fine-tuning, suggesting FoB’s efficiency advantage is its primary differentiator rather than absolute accuracy. The cross-domain experiments (CT$\to$MRI and MRI$\to$CT) provide strong evidence for generalization, with FoB+SAM achieving 73.30% and 67.02% respectively, well above RobustEMD and FAMNet. However, the Skin-DS benchmark is novel to this work, preventing direct comparison with prior FSMIS methods on this modality.

“outperforms other baselines by large margins”

paper · Abstract

“FoB provides accurate prompts across domains and significantly outperforms methods tailored for CD-FSS”

paper · Section 4.2

Reproducibility

The authors commit to open science with the statement that "Our code is available at https://github.com/primebo1/FoB_SAM" and provide detailed experimental settings: ResNet-101 backbone pre-trained on MS-COCO, ViT-H SAM, 36K iterations with batch size 1, Adam optimizer with learning rate $1\times 10^{-4}$, and specific loss weights $\lambda_1 = 1\times 10^3$, $\lambda_2 = 1\times 10^{-4}$. However, reproducibility depends critically on the supervoxel/superpixel pseudo-labeling pipeline—particularly the SLIC parameters (compactness=15, 5 superpixels) for Skin-DS—which involves preprocessing steps that could be sensitive to the specific implementation versions. The supplementary material contains extensive ablations on hyperparameters like $\sigma$, $\kappa$, and $k$, which helps, but the main paper lacks discussion of sensitivity to the foreground threshold $\mathcal{T}=0.9$ and the contrastive temperature $\tau=0.1$.

“Our code is available at https://github.com/primebo1/FoB_SAM”

paper · Abstract

“trained on an NVIDIA RTX 4080S GPU for 36K iterations with a batch size of 1... Adam optimizer with an initial learning rate of $1\times 10^{-4}$”

paper · Section 4

Abstract

Conventional few-shot medical image segmentation (FSMIS) approaches face performance bottlenecks that hinder broader clinical applicability. Although the Segment Anything Model (SAM) exhibits strong category-agnostic segmentation capabilities, its direct application to medical images often leads to over-segmentation due to ambiguous anatomical boundaries. In this paper, we reformulate SAM-based FSMIS as a prompt localization task and propose FoB (Focus on Background), a background-centric prompt generator that provides accurate background prompts to constrain SAM's over-segmentation. Specifically, FoB bridges the gap between segmentation and prompt localization by category-agnostic generation of support background prompts and localizing them directly in the query image. To address the challenge of prompt localization for novel categories, FoB models rich contextual information to capture foreground-background spatial dependencies. Moreover, inspired by the inherent structural patterns of background prompts in medical images, FoB models this structure as a constraint to progressively refine background prompt predictions. Experiments on three diverse medical image datasets demonstrate that FoB outperforms other baselines by large margins, achieving state-of-the-art performance on FSMIS, and exhibiting strong cross-domain generalization. Our code is available at https://github.com/primebo1/FoB_SAM.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.