Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

cs.CR cs.AI Yanming Mu, Hao Hu, Feiyang Li, Qiao Yuan, Jiang Wu, Zichuan Liu, Pengcheng Liu, Mei Wang, Hongwei Zhou, Yuling Liu · Mar 23, 2026

What it does

Why it matters

This survey systematically maps threats across the RAG pipeline—vector database construction, retrieval, and generation—and categorizes corresponding defenses from input-side access control to output-side privacy preservation. As a...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Retrieval-Augmented Generation (RAG) systems mitigate large language model hallucinations by integrating external knowledge bases, yet this multi-module architecture introduces complex security vulnerabilities spanning data poisoning, membership inference, and adversarial manipulation. This survey systematically maps threats across the RAG pipeline—vector database construction, retrieval, and generation—and categorizes corresponding defenses from input-side access control to output-side privacy preservation. As a comprehensive review of 152 papers, it aims to unify the analysis of threat models, defense mechanisms, and evaluation benchmarks to foster trustworthy RAG deployments in sensitive domains.

Critical review

Verdict

Bottom line

This paper delivers a comprehensive and timely synthesis of RAG security research, effectively mapping the threat landscape across the retrieval-generation pipeline. The authors structure their analysis around a clear taxonomy that distinguishes between attacks on data integrity (poisoning), confidentiality (membership inference, embedding inversion), and operational reliability (adversarial and indirect attacks). However, the claim of being the "first end-to-end survey" is nuanced by Table 1, which shows other surveys cover overlapping ground, albeit with different emphases; the primary value lies in the unified coverage of evaluation benchmarks which others omit.

“To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems.”

this survey · Abstract

“Comparison of Research Scopes in Existing Survey Literature on RAG Security”

this survey · Table 1

What holds up

The paper’s strength lies in its systematic architectural analysis and granular threat categorization. The workflow decomposition into Vector Database Construction, Retriever, and Generator provides a rigorous foundation for pinpointing vulnerabilities. The detailed breakdown of data poisoning attacks—explicitly defining the dual constraints of retrieval and generation conditions (Algorithm 1)—and the comprehensive taxonomy of defenses spanning access control, homomorphic encryption, and differential privacy demonstrate thorough coverage. The bibliometric analysis noting a surge from fewer than 5 papers annually pre-2024 to 48 in 2025 convincingly establishes the field's rapid evolution into a critical research frontier.

“retrieval condition requires the injected text to exhibit high similarity to the target query within the semantic vector space... generation condition dictates that the malicious text, once integrated into the context, must be highly misleading”

this survey · Section 3.1.1

“the publication count surged significantly to 18 in 2024 and further skyrocketed to 48 in 2025”

this survey · Figure 1 caption

Main concerns

While comprehensive, the survey prioritizes breadth over depth in technical mechanisms, occasionally offering high-level descriptions that may insufficiently distinguish between methodologically distinct defenses. For instance, the discussion of homomorphic encryption notes "significant computational and communication overhead" without quantifying these costs relative to standard RAG latency, which limits practical assessment of deployability. Additionally, the field’s rapid evolution—acknowledged by the authors—risks rapid obsolescence; claims about specific state-of-the-art attacks or defenses may shift given the 2025 publication surge depicted in Figure 1. The reliance on algorithmic descriptions without accompanying complexity analysis or concrete parameter settings limits implementation guidance for practitioners.

“homomorphic encryption often incurs significant computational and communication overhead, easily becoming a performance bottleneck under large-scale vector databases”

this survey · Section 4.1.2

“Data Poisoning Attacks emerges as the most prominent research area (28.4%), significantly overshadowing other specific attack and defense vectors”

this survey · Figure 1b caption

Evidence and comparison

The evidence base comprises 152 papers, systematically categorized across threats, defenses, and benchmarks. The comparison matrix (Table 1) effectively positions this work against five concurrent surveys (Wu, Gu, He, Arz, Wang 2025), demonstrating superior coverage in "Evaluation Benchmarks & Metrics System" and "Security Defense & Mitigation Strategies." The evidence for specific threats is well-cited; for example, data poisoning is grounded in PoisonedRAG [45] and PR-Attack [46], while membership inference references S²MIA [56] and DC-MIA [59]. The survey correctly identifies the imbalance in current research: data poisoning dominates (28.4%) while defense frameworks (14.9%) and privacy preservation (10.4%) lag behind.

“Based on a comprehensive analysis of 152 relevant papers, we examine the security risks introduced by each module within the RAG multi-modular architecture”

this survey · Section 1

“This survey achieves comprehensive coverage across four core dimensions, specifically encompassing architectural analysis, threat analysis, defense strategies, and evaluation metrics”

this survey · Table 1

Reproducibility

As a literature survey, reproducibility hinges on the transparency and completeness of its methodology. The authors state they conducted a "systematic and extensive survey of 152 relevant papers," but the provided text omits the specific search protocol, inclusion criteria, or PRISMA-style screening process. No supplementary materials, code repositories, or interactive databases are mentioned to allow readers to reproduce the literature selection or verify the taxonomy construction. The bibliography is truncated in the provided text, preventing independent verification of key citations. While the taxonomy itself is reproducible in structure, the exact paper selection methodology remains opaque.

Abstract

Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.