Score: 0

Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation

Published: April 29, 2025 | arXiv ID: 2504.20414v1

By: Joshua Chiu, Partha Protim Paul, Zahin Wahab

Potential Business Impact:

Makes secret messages easier to break.

Business Areas:

Semantic Search Internet Services

Searchable Symmetric Encryption (SSE) enables efficient search capabilities over encrypted data, allowing users to maintain privacy while utilizing cloud storage. However, SSE schemes are vulnerable to leakage attacks that exploit access patterns, search frequency, and volume information. Existing studies frequently assume that adversaries possess a substantial fraction of the encrypted dataset to mount effective inference attacks, implying there is a database leakage of such documents, thus, an assumption that may not hold in real-world scenarios. In this work, we investigate the feasibility of enhancing leakage attacks under a more realistic threat model in which adversaries have access to minimal leaked data. We propose a novel approach that leverages large language models (LLMs), specifically GPT-4 variants, to generate synthetic documents that statistically and semantically resemble the real-world dataset of Enron emails. Using the email corpus as a case study, we evaluate the effectiveness of synthetic data generated via random sampling and hierarchical clustering methods on the performance of the SAP (Search Access Pattern) keyword inference attack restricted to token volumes only. Our results demonstrate that, while the choice of LLM has limited effect, increasing dataset size and employing clustering-based generation significantly improve attack accuracy, achieving comparable performance to attacks using larger amounts of real data. We highlight the growing relevance of LLMs in adversarial contexts.

Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset

Cryptography and Security

Breaks hidden messages even with some clues.

2 Nov 2025 0

87%

Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

Computation and Language

AI might be cheating to find answers.

19 Apr 2025 1

87%

Revisiting the attacker's knowledge in inference attacks against Searchable Symmetric Encryption

Cryptography and Security

Makes secret searches safer from snoops.

14 Apr 2025 1

View PDF Login to Bookmark

Page Count

9 pages

Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation

Makes secret messages easier to break.

Technical Abstract

Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset

Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

Revisiting the attacker's knowledge in inference attacks against Searchable Symmetric Encryption