Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis

cs.CV cs.AI Tian Xia, Matthew Sinclair, Andreas Schuh, Fabio De Sousa Ribeiro, Raghav Mehta, Rajat Rasal, Esther Puyol-Ant\'on, Samuel Gerber, Kersten Petersen, Michiel Schaap, Ben Glocker · Mar 22, 2026
Local to this browser
What it does
Existing counterfactual image generation methods produce either global changes or require tedious user-defined masks. This paper proposes Positional Seg-CFT, which subdivides anatomical structures into regional segments (e.
Why it matters
, proximal, mid, distal) and derives independent measurements per region from pretrained segmentors. The extension enables spatially localized interventions for modeling regional disease progression, demonstrated on coronary CT angiography.
Main concern
The paper presents a technically sound but incremental extension of Segmentor-guided Counterfactual Fine-Tuning (Seg-CFT). By partitioning structures into regional segments and computing area measurements within each region, Pos-Seg-CFT...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Existing counterfactual image generation methods produce either global changes or require tedious user-defined masks. This paper proposes Positional Seg-CFT, which subdivides anatomical structures into regional segments (e.g., proximal, mid, distal) and derives independent measurements per region from pretrained segmentors. The extension enables spatially localized interventions for modeling regional disease progression, demonstrated on coronary CT angiography.

Critical review
Verdict
Bottom line

The paper presents a technically sound but incremental extension of Segmentor-guided Counterfactual Fine-Tuning (Seg-CFT). By partitioning structures into regional segments and computing area measurements within each region, Pos-Seg-CFT achieves measurable improvements in spatial localization over global regression-based supervision (Reg-CFT). However, the contribution is primarily architectural—replacing global aggregation with masked regional aggregation—rather than algorithmic. The reliance on a self-cited companion paper (Xia et al., 2025) for the foundation of Seg-CFT, which appears unpublished, weakens the standalone contribution.

What holds up

The core insight that global regressors leverage spurious correlations (e.g., vessel brightness) rather than anatomical cues is well-supported by prior literature and validated here. Table 1 demonstrates that Pos-Seg-CFT consistently achieves lower effectiveness error $d(\mathbf{\widehat{pa}}_{\mathbf{x}},\mathbf{\widetilde{pa}}_{\mathbf{x}})$ across all positional regions compared to both No-CFT and Reg-CFT baselines. Figure 1 provides convincing visual evidence that Pos-Seg-CFT reduces “positional leakage,” with difference maps showing localized changes confined to intervened regions (proximal, mid, or distal) versus the global artifacts observed under Reg-CFT.

“Pos-Seg-CFT achieves the most localized and stable effects, consistently reducing non-target activations while preserving strong targeted changes.”
paper · Table 1
“The baseline Reg-CFT exhibits noticeable positional leakage... In contrast, Pos-Seg-CFT produces well-localized and anatomically consistent modifications confined to the intended regions.”
paper · Figure 1
Main concerns

The paper assumes that all variables are independent, stating explicitly that "all variables are treated as independent of each other." This is biologically implausible for coronary anatomy where proximal and distal plaque distributions are physiologically coupled through hemodynamic and biological factors. The method also lacks any ablation study on the number or definition of regions—why three regions, and how sensitive are results to boundary placement? Finally, the work is predicated on Seg-CFT (Xia et al., 2025), cited as "[xia2025segmentor]" but not available for verification, making the incremental contribution difficult to assess independently.

“For simplicity, all variables are treated as independent of each other.”
paper · Section 4
“In [xia2025segmentor], Reg-CFT was evaluated for structure-specific interventions... The results showed that Reg-CFT often produced global rather than localized changes.”
paper · Section 1
Evidence and comparison

The quantitative evidence supports the claim of improved localization, with Pos-Seg-CFT showing reduced off-target effects across all nine variable-region combinations (Table 1). However, the comparison is limited to internal baselines (No-CFT and Reg-CFT) from the same research group, with no comparison to external methods such as pixel-level diffusion guidance or other medical image counterfactual approaches. The effectiveness metric measures scalar alignment but does not capture clinical fidelity—whether cardiologists would accept the generated plaque patterns as realistic disease progressions remains unvalidated.

“Quantitative effectiveness is summarized in Table 1, reporting effect magnitudes across positional regions... Pos-Seg-CFT achieves the most localized and stable effects.”
paper · Section 4
Reproducibility

Reproducibility is severely limited. The experiments use an "internal coronary computed tomography angiography (CCTA) dataset" of 65,706 images, which is not publicly available. No code repository or implementation details (e.g., segmentor architecture, training hyperparameters, learning rates) are provided. While the paper states that "the segmentor used for evaluation is trained independently from the one used for fine-tuning," it does not specify whether the segmentor weights or training protocol will be released. The combination of proprietary data and absent code makes independent reproduction impossible.

“We conduct experiments on an internal coronary computed tomography angiography (CCTA) dataset [taylor2023]... In total, 65,706 CCTA images were processed.”
paper · Section 4
“The segmentor used for evaluation is trained independently from the one used for fine-tuning to ensure an unbiased assessment.”
paper · Section 4
Abstract

Counterfactual image generation enables controlled data augmentation, bias mitigation, and disease modeling. However, existing methods guided by external classifiers or regressors are limited to subject-level factors (e.g., age) and fail to produce localized structural changes, often resulting in global artifacts. Pixel-level guidance using segmentation masks has been explored, but requires user-defined counterfactual masks, which are tedious and impractical. Segmentor-guided Counterfactual Fine-Tuning (Seg-CFT) addressed this by using segmentation-derived measurements to supervise structure-specific variables, yet it remains restricted to global interventions. We propose Positional Seg-CFT, which subdivides each structure into regional segments and derives independent measurements per region, enabling spatially localized and anatomically coherent counterfactuals. Experiments on coronary CT angiography show that Pos-Seg-CFT generates realistic, region-specific modifications, providing finer spatial control for modeling disease progression.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.