Retrieving Climate Change Disinformation by Narrative

cs.CL Max Upravitelev, Veronika Solopova, Charlott Jakob, Premtim Sahitaj, Vera Schmitt · Mar 23, 2026

What it does

Why it matters

The approach achieves MAP 0. 505 on CARDS and demonstrates robustness to high narrative variance that cripples standard baselines.

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

This paper reframes climate disinformation detection from classification to retrieval, treating narrative core messages as queries to rank corpus texts without fixed taxonomies. They propose SpecFi, which generates hypothetical documents using community summaries from graph-based detection (NodeRAG) as few-shot examples. The approach achieves MAP 0.505 on CARDS and demonstrates robustness to high narrative variance that cripples standard baselines.

Critical review

Verdict

Bottom line

The paper presents a compelling retrieval-based approach to climate disinformation detection that addresses the rigidity of fixed taxonomies. By combining HyDE-style generation with graph-derived community summaries, SpecFi-CS achieves strong zero-shot performance and demonstrates superior robustness to narrative variance compared to sparse and dense baselines. The finding that unsupervised community summaries converge toward expert taxonomies is particularly noteworthy for emerging narrative detection.

“On CARDS, SpecFi-CS with the abliterated model achieves the highest MAP (0.505) among all label-free setups”

Upravitelev et al., Table 2 · Section 4 Evaluation

“BM25 loses 63.4% of its MAP on high-variance narratives; SpecFi-CS-a loses 32.7%”

Upravitelev et al., Figure 3 · Section 5 Statistical Analysis

What holds up

The architectural intuition bridging abstract narrative descriptions to concrete textual instantiations through hypothetical document generation is well-motivated and empirically validated. The proposed narrative variance metric $V_i = \frac{1}{m_i} \sum_{j=1}^{m_i} \lVert \mathbf{t}_{ij} - \mathbf{c}_i \rVert_2^2$ provides a useful embedding-based predictor of retrieval difficulty, showing that internal narrative spread—not inter-narrative separation—drives performance degradation. The observation that community summaries align with expert taxonomies despite operating without labels offers a promising pathway for unsupervised narrative discovery.

“Narrative Variance measures the overall spread of texts around the centroid: $V_i = \frac{1}{m_i} \sum_{j=1}^{m_i} \lVert \mathbf{t}_{ij} - \mathbf{c}_i \rVert_2^2$”

Upravitelev et al., Eq. 2 · Section 3.3

“Of the 17 CARDS narratives, 11 receive summaries that align with the taxonomy label at least at the super-claim level”

Upravitelev et al. · Section 6 Discussion

Main concerns

The statistical analysis relies heavily on CARDS (17 narratives), while Climate Obstruction (7 narratives) and PolyNarrative (mean 2 texts per narrative) lack sufficient power for reliable inference—contradictory trends on PN are dismissed as artifacts of small sample size. The reliance on OpenAI models for NodeRAG's graph construction violates full reproducibility claims, and the high computational cost (~24 seconds per narrative vs. sub-second BM25) limits practical deployment. The fixed choice of $n=10$ hypothetical documents lacks systematic ablation, and potential data contamination (CARDS published in 2021 likely appearing in LLM training data) raises questions about the zero-shot validity.

“On CO, correlations should be interpreted with caution given the limited number of narratives ($K=7$)”

Upravitelev et al. · Section 5 Statistical Analysis

“SpecFi-CS requires approximately 24 seconds per narrative on CARDS compared to <1 second for BM25”

Upravitelev et al. · Limitations

“The number of hypothetical documents ($n=10$) was selected via preliminary testing and not systematically ablated”

Upravitelev et al. · Limitations

“CARDS dataset is from 2021, making it likely to be part of the training data of LLMs”

Upravitelev et al. · Limitations

Evidence and comparison

The evidence supports the primary claim that SpecFi-CS outperforms baselines on CARDS, but cross-dataset generalization is weak—SpecFi-DR actually outperforms SpecFi-CS on Climate Obstruction (0.519 vs 0.491), suggesting community summaries' efficacy varies with dataset characteristics. The comparison between "convergence," "collapse," and "drift" patterns in community summaries provides diagnostic value, though the small sample sizes prevent robust statistical validation of these failure modes across different narrative types.

“On CO, SpecFi-DR outperforms SpecFi-CS (0.519 vs. 0.491), suggesting that the relative advantage of community summaries over retrieved texts depends on dataset characteristics”

Upravitelev et al. · Section 4 Evaluation

“Narratives 4_1 and 4_2 are collapsed with a sibling sub-narrative”

Upravitelev et al. · Section 6 Discussion

Reproducibility

Full reproduction is blocked by dependencies on OpenAI models for NodeRAG's structured output generation during graph construction, despite the authors providing code for SpecFi-DR with open-source alternatives. Critical hyperparameters ($n=10$ hypotheticals) were selected via preliminary testing without systematic ablation. While the authors commit to releasing code, the runtime requirements (H100 GPU for embeddings, API access for generation) and potential data leakage of the evaluation corpus into LLM training data present practical barriers to independent verification of the zero-shot claims.

“SpecFi-CS setups include one reliance on OpenAI models within NodeRAG”

Upravitelev et al. · Limitations

“Reference code is available at: https://anonymous.4open.science/r/SpecFi/”

Upravitelev et al., Algorithm 1 · Section 3.1

“All experiments were run on a system with an NVIDIA H100 GPU”

Upravitelev et al. · Appendix A.1.1

Abstract

Detecting climate disinformation narratives typically relies on fixed taxonomies, which do not accommodate emerging narratives. Thus, we re-frame narrative detection as a retrieval task: given a narrative's core message as a query, rank texts from a corpus by alignment with that narrative. This formulation requires no predefined label set and can accommodate emerging narratives. We repurpose three climate disinformation datasets (CARDS, Climate Obstruction, climate change subset of PolyNarrative) for retrieval evaluation and propose SpecFi, a framework that generates hypothetical documents to bridge the gap between abstract narrative descriptions and their concrete textual instantiations. SpecFi uses community summaries from graph-based community detection as few-shot examples for generation, achieving a MAP of 0.505 on CARDS without access to narrative labels. We further introduce narrative variance, an embedding-based difficulty metric, and show via partial correlation analysis that standard retrieval degrades on high-variance narratives (BM25 loses 63.4% of MAP), while SpecFi-CS remains robust (32.7% loss). Our analysis also reveals that unsupervised community summaries converge on descriptions close to expert-crafted taxonomies, suggesting that graph-based methods can surface narrative structure from unlabeled text.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.