ShapDBM: Exploring Decision Boundary Maps in Shapley Space

cs.HC cs.LG Luke Watkin, Daniel Archambault, Alex Telea · Mar 23, 2026

What it does

Why it matters

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

ShapDBM addresses the fragmentation problem in Decision Boundary Maps (DBMs) by transforming data into Shapley space before applying dimensionality reduction. This creates more compact decision zones that reflect model behavior rather than raw data distribution, enabling high-quality visualization of complex datasets like SVHN where traditional data-space DBMs fail.

Critical review

Verdict

Bottom line

The paper presents a novel and theoretically well-motivated approach to improving DBM quality by leveraging Shapley values. The SVHN results are compelling, demonstrating a dramatic improvement in map accuracy from 25.3% to 85.4%. However, the method shows inconsistent benefits—performing worse on MNIST than simple data-space projections—and faces prohibitive computational costs with 18-hour Shapley value computation times that limit practical applicability.

“Compared to the data-space derived map (Figure 1(a)), we see a significant improvement in the separation and shape of decision zones, reflected in the MA increasing from 25.3% to 85.4%”

paper · Section 5.1

“Shapley value estimation is computationally expensive. On our largest dataset, estimation took close to 18 hours”

paper · Section 6

“The data-space projections map (Figure 2(a)) has a marginally higher MA of 96.8% compared to our Shapley-projection map, where MA=91.8%”

paper · Section 5.1

What holds up

The core theoretical insight—that samples treated similarly by the model naturally cluster in Shapley space—provides a principled solution to the fragmentation problem identified in prior work. The SVHN case study successfully produces the first high-quality DBM for this dataset in the literature, visually demonstrating compact decision zones where data-space methods yield chaotic, fragmented maps. The metric-based evaluation is comprehensive, and the authors honestly acknowledge trade-offs in their inverse projection analysis.

“we see a significant improvement in the separation and shape of decision zones”

paper · Section 5.1, SVHN results

“Shapley space derived samples consistently have muted colours. P(D) reconstructions make errors in structure”

paper · Section 5.2

Main concerns

Three issues limit the paper's claims. First, performance is inconsistent: ShapDBM underperforms on MNIST (MA 91.8% vs 96.8%) and produces mixed results on CIFAR-4 (MA 44.4% vs 49.4%), suggesting benefits are limited to cases where data-space DR catastrophically fails. Second, the inverse projection reconstruction quality is significantly degraded in Shapley space, raising validity questions about the synthetic samples despite the authors' argument that sampling farther from original data 'is good for exploring f'. Third, the arbitrary CIFAR-4 subset selection (excluding 6 classes) weakens the evaluation of 'complex' datasets.

“CIFAR-4: Data space MA = 44.4%, our method MA = 44.4%”

paper · Table 2

“P(S) cannot reconstruct the samples D as well as P(D)”

paper · Section 5.2

“we use a subset referred to as CIFAR-4... classes airplane, cat, deer and ship”

paper · Section 4

Evidence and comparison

The evidence supports the core claim for SVHN but not uniformly across datasets. The comparison is limited by testing only t-SNE in the main text (UMAP relegated to supplements) and using only one CNN architecture. Both methods perform poorly on CIFAR-4 (<50% MA), undermining the claim of handling increasingly complex datasets. Related work is adequately cited, though alternatives to Shapley values for feature importance are not meaningfully compared.

“The challenge presented by CIFAR-4 is reflected in the overall quality the maps computed by data-space projections and our method – both struggling to achieve >50% MA values”

paper · Section 5.1, CIFAR-4

“For space constraints, we focus next on t-SNE (UMAP results are given in the supp. material)”

paper · Section 4

Reproducibility

Experimental details are generally sufficient: fixed random seed, hyperparameters specified ($r=500$, $l=1$, 100 DeepExplainer samples), default scikit-learn t-SNE, and a GitHub link for the CNN architecture. However, reproducing the work requires significant computational resources (18 hours for Shapley values on SVHN), and the study provides no containerization, dependency specifications, or code for the full pipeline—only the model architecture is referenced online.

“The full architecture is available online”

paper · Section 4

“Shapley value estimation is computationally expensive. On our largest dataset, estimation took close to 18 hours”

paper · Section 6

Abstract

Decision Boundary Maps (DBMs) are an effective tool for visualising machine learning classification boundaries. Yet, DBM quality strongly depends on the dimensionality reduction (DR) technique and high dimensional space used for the data points. For complex ML datasets, DR can create many mixed classes which, in turn, yield DBMs that are hard to use. We propose a new technique to compute DBMs by transforming data space into Shapley space and computing DR on it. Compared to standard DBMs computed directly from data, our maps have similar or higher quality metric values and visibly more compact, easier to explore, decision zones.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.