Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks
Retrieval-Augmented Generation (RAG) systems mitigate large language model hallucinations by integrating external knowledge bases, yet this multi-module architecture introduces complex security vulnerabilities spanning data poisoning, membership inference, and adversarial manipulation. This survey systematically maps threats across the RAG pipeline—vector database construction, retrieval, and generation—and categorizes corresponding defenses from input-side access control to output-side privacy preservation. As a comprehensive review of 152 papers, it aims to unify the analysis of threat models, defense mechanisms, and evaluation benchmarks to foster trustworthy RAG deployments in sensitive domains.
This paper delivers a comprehensive and timely synthesis of RAG security research, effectively mapping the threat landscape across the retrieval-generation pipeline. The authors structure their analysis around a clear taxonomy that distinguishes between attacks on data integrity (poisoning), confidentiality (membership inference, embedding inversion), and operational reliability (adversarial and indirect attacks). However, the claim of being the "first end-to-end survey" is nuanced by Table 1, which shows other surveys cover overlapping ground, albeit with different emphases; the primary value lies in the unified coverage of evaluation benchmarks which others omit.
The paper’s strength lies in its systematic architectural analysis and granular threat categorization. The workflow decomposition into Vector Database Construction, Retriever, and Generator provides a rigorous foundation for pinpointing vulnerabilities. The detailed breakdown of data poisoning attacks—explicitly defining the dual constraints of retrieval and generation conditions (Algorithm 1)—and the comprehensive taxonomy of defenses spanning access control, homomorphic encryption, and differential privacy demonstrate thorough coverage. The bibliometric analysis noting a surge from fewer than 5 papers annually pre-2024 to 48 in 2025 convincingly establishes the field's rapid evolution into a critical research frontier.
While comprehensive, the survey prioritizes breadth over depth in technical mechanisms, occasionally offering high-level descriptions that may insufficiently distinguish between methodologically distinct defenses. For instance, the discussion of homomorphic encryption notes "significant computational and communication overhead" without quantifying these costs relative to standard RAG latency, which limits practical assessment of deployability. Additionally, the field’s rapid evolution—acknowledged by the authors—risks rapid obsolescence; claims about specific state-of-the-art attacks or defenses may shift given the 2025 publication surge depicted in Figure 1. The reliance on algorithmic descriptions without accompanying complexity analysis or concrete parameter settings limits implementation guidance for practitioners.
The evidence base comprises 152 papers, systematically categorized across threats, defenses, and benchmarks. The comparison matrix (Table 1) effectively positions this work against five concurrent surveys (Wu, Gu, He, Arz, Wang 2025), demonstrating superior coverage in "Evaluation Benchmarks & Metrics System" and "Security Defense & Mitigation Strategies." The evidence for specific threats is well-cited; for example, data poisoning is grounded in PoisonedRAG [45] and PR-Attack [46], while membership inference references S²MIA [56] and DC-MIA [59]. The survey correctly identifies the imbalance in current research: data poisoning dominates (28.4%) while defense frameworks (14.9%) and privacy preservation (10.4%) lag behind.
As a literature survey, reproducibility hinges on the transparency and completeness of its methodology. The authors state they conducted a "systematic and extensive survey of 152 relevant papers," but the provided text omits the specific search protocol, inclusion criteria, or PRISMA-style screening process. No supplementary materials, code repositories, or interactive databases are mentioned to allow readers to reproduce the literature selection or verify the taxonomy construction. The bibliography is truncated in the provided text, preventing independent verification of key citations. While the taxonomy itself is reproducible in structure, the exact paper selection methodology remains opaque.
Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.