Which Concepts to Forget and How to Refuse? Decomposing Concepts for Continual Unlearning in Large Vision-Language Models

cs.CV Hyundong Jin, Dongyoon Han, Eunwoo Kim · Mar 23, 2026
Local to this browser
What it does
The paper addresses continual unlearning in Large Vision-Language Models (LVLMs), where models must sequentially remove specific vision-instruction pairs without full retraining while preserving general utility. Prior methods suffer from...
Why it matters
Prior methods suffer from distorted shared representations that create spurious associations, leading to irrelevant refusals for past forget data and over-refusal of retain queries. The proposed framework, CORE (COncept-aware REfuser),...
Main concern
The paper presents a compelling solution to a critical but understudied problem: maintaining precise refusal behavior across sequential unlearning tasks in multimodal models. The core insight—that grounding refusals in decomposed visual...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

The paper addresses continual unlearning in Large Vision-Language Models (LVLMs), where models must sequentially remove specific vision-instruction pairs without full retraining while preserving general utility. Prior methods suffer from distorted shared representations that create spurious associations, leading to irrelevant refusals for past forget data and over-refusal of retain queries. The proposed framework, CORE (COncept-aware REfuser), decomposes deletion targets into fine-grained visual attributes and textual intents, using a concept modulator to identify which combinations characterize each forget category and a mixture of specialized refusal experts to generate contextually appropriate refusals.

Critical review
Verdict
Bottom line

The paper presents a compelling solution to a critical but understudied problem: maintaining precise refusal behavior across sequential unlearning tasks in multimodal models. The core insight—that grounding refusals in decomposed visual and textual concepts mitigates representation distortion—is well-validated. The modular architecture, combining concept modules with a learned modulator and a routing scheme for refusal experts, demonstrates clear empirical superiority over existing methods. As reported in Section 4.2, "CORE achieves the highest AR of 88.02% and CRR of 90.67%, demonstrating the effectiveness of its relevance-based refuser activation." The two-stage training strategy and inference-time calibration provide practical stability for continual deployment.

“CORE achieves the highest AR of 88.02% and CRR of 90.67%, demonstrating the effectiveness of its relevance-based refuser activation and calibrated refuser mixture contribution.”
Jin et al., Sec. 4.2 · Section 4.2
What holds up

The concept-level decomposition is technically sound and empirically robust. By extracting visual attributes and linguistic intents through concept modules supervised with cosine similarity losses $\mathcal{L}_{\text{con}}=-\sum_{\text{q}\in\{\text{img},\text{txt}\}}\sum_{i=1}^{N^{t}}\text{sim}(E^{t}_{\text{q},i},\hat{E}_{\text{q},i})$, the model grounds refusals in interpretable semantics rather than spurious correlations. The ablation study confirms that each component is essential: removing the concept modulator causes CRR to drop from 86.19% to 71.14% (Last), while removing the relevance-guided activation causes catastrophic failure (29.64% CRR). The supplementary analysis also validates that "Without relevance guidance, activation concentrates on only a few refusers," whereas the proposed routing produces distinct activation patterns per task.

“$\mathcal{L}_{\text{con}}=-\sum_{\text{q}\in\{\text{img},\text{txt}\}}\sum_{i=1}^{N^{t}}\text{sim}(E^{t}_{\text{q},i},\hat{E}_{\text{q},i}),$”
Jin et al., Eq. 2 · Equation 2
“the method with ACT and CAL underperforms on most metrics, achieving 86.75% of AR and 71.14% of CRR in terms of Last”
Jin et al., Table 3 · Table 3
“Without relevance guidance, activation concentrates on only a few refusers”
Jin et al., Supp. Fig. B · Supplementary Material, Figure B
Main concerns

The framework relies on LLM-generated concept descriptions (20 per category), which introduces external dependency and potential brittleness if concept generation fails to capture nuanced visual-linguistic boundaries. While the authors test robustness across GPT, Gemini, and Claude, the method assumes continued access to capable external LLMs for each new forget category. Additionally, the computational cost scales linearly with concept accumulation; Table B shows first-stage training increases from 2.41 min (5 concepts) to 5.26 min (20 concepts) per task, which may become prohibitive for very long unlearning sequences. The paper also lacks theoretical analysis of certified removal or formal privacy guarantees, focusing solely on empirical refusal metrics. Finally, the evaluation is limited to relatively coarse-grained safety categories and ImageNet-R classification; complex multimodal reasoning tasks with entangled forget/retain boundaries are not explored.

“20 Concepts / Category: $\approx$ 5.26 min / 15.32 min”
Jin et al., Supp. Table B · Supplementary Table B
Evidence and comparison

The evidence strongly supports the core claims through comprehensive comparisons against continual learning (EWC, LwF, GMM) and unlearning baselines (SCRUB, O3). The evaluation metrics are well-designed: Context-aware Refusal Rate (CRR) measures semantic alignment of refusals while Refusal Gap ($\Delta_{RR}$) captures inappropriate negation. Core limitations in baselines are exposed: conventional methods show $\Delta_{RR}$ ranging from 26.08 to 37.43, indicating indiscriminate negation, whereas CORE achieves 3.74. However, the comparison with O3 [14] is potentially unfair as O3 was designed for unimodal LLMs; the performance gap (17.64% higher CRR) may reflect architectural mismatch rather than pure methodological superiority. The analysis of concept activation patterns in Figure 6 provides convincing qualitative evidence that the modulator successfully suppresses irrelevant concepts (shown in red) compared to the baseline.

“These methods exhibit high $\Delta_{RR}$ after unlearning all tasks, ranging from 26.08 to 37.43”
Jin et al., Sec. 4.2 · Table 1, Last metrics
“O3 ... shows substantial gaps of 37.85% and 17.64% compared to the proposed method, respectively”
Jin et al., Sec. 4.2 · Section 4.2
Reproducibility

The paper provides substantial implementation details in Appendix A, including model specifications (Vicuna-7B, LLaMA-2-7B with ViT-g/14), optimizer settings (Adam with $\beta_1=0.9$, $\beta_2=0.999$), and architectural hyperparameters ($N_R=20$ refusers, 2 engaged per sample). The two-stage training pipeline is clearly described, and the concept generation prompt template is explicitly provided ("Given the image and instruction pair, identify 20 visual and linguistic concepts..."). However, critical reproducibility elements are missing or truncated in the provided text: the exact random seeds, full refusal response templates (Table F is truncated), and code repository link are not visible. The reliance on ChatGPT/Gemini/Claude for concept generation introduces non-determinism that is not quantified. While the supplementary material promises additional details, the main text lacks explicit data split ratios and exact prototype storage mechanisms needed for exact reproduction.

“Given the image and instruction pair, identify 20 visual and linguistic concepts corresponding to visual and textual modalities, respectively.”
Jin et al., Supp. Table A · Supplementary Table A
“We set the number of refusers $N_R$ to 20 and engage two refusers for each sample”
Jin et al., Sec. 4.1 · Section 4.1
Abstract

Continual unlearning poses the challenge of enabling large vision-language models to selectively refuse specific image-instruction pairs in response to sequential deletion requests, while preserving general utility. However, sequential unlearning updates distort shared representations, creating spurious associations between vision-language pairs and refusal behaviors that hinder precise identification of refusal targets, resulting in inappropriate refusals. To address this challenge, we propose a novel continual unlearning framework that grounds refusal behavior in fine-grained descriptions of visual and textual concepts decomposed from deletion targets. We first identify which visual-linguistic concept combinations characterize each forget category through a concept modulator, then determine how to generate appropriate refusal responses via a mixture of refusal experts, termed refusers, each specialized for concept-aligned refusal generation. To generate concept-specific refusal responses across sequential tasks, we introduce a multimodal, concept-driven routing scheme that reuses refusers for tasks sharing similar concepts and adapts underutilized ones for novel concepts. Extensive experiments on vision-language benchmarks demonstrate that the proposed framework outperforms existing methods by generating concept-grounded refusal responses and preserving the general utility across unlearning sequences.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.