Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

cs.LG Andrii Shportko · Mar 23, 2026

What it does

Why it matters

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

This paper establishes information-theoretic limits on LLM steganography, proving that any semantic-preserving embedding of a payload $P$ into a covertext $M_1$ to produce stegotext $M_2$ must increase Kolmogorov complexity by at least $K(P) - O(\log n)$. Since Kolmogorov complexity is uncomputable, the authors propose perplexity ratios (specifically the Binoculars score) as a practical proxy and validate the approach on a color-based encoding scheme with 300 samples.

Critical review

Verdict

Bottom line

The theoretical result (Theorem 3) is sound and elegantly proven using symmetry-of-information arguments, establishing a 'no free lunch' principle for steganography. However, the experimental validation is limited to a single synthetic encoding scheme (synesthetic color embedding), and the connection from Kolmogorov complexity to the proposed Quadroculars score relies on strong, unverifiable assumptions about model optimality. The paper succeeds as a theoretical foundation but overstates the practical certainty of its proxy detection method.

“K(M_2) \geq K(M_1) + K(P) - O(\log n)”

Shportko, Theorem 3 · Section 3

“Under assumptions A1–A3, the Quadroculars score satisfies Q(M_2,M_1) = \frac{K(M_2)-K(M_1)}{T \cdot \log \mathrm{X\text{-}PPL}_{\mathcal{M}_1,\mathcal{M}_2}(M_1)} + R”

Shportko, Proposition 2 · Section 4.4

What holds up

The main theoretical bound is tight and correctly derived. The proof properly applies the chain rule and symmetry-of-information theorem from Kolmogorov complexity to establish that semantic preservation combined with lossless extraction forces the complexity increase. The connection to compression-based complexity proxies (Cilibrasi & Vitányi 2005) is well-founded, and the observation that perplexity serves as a probabilistic analogue of compression length is insightful and technically correct.

“By the symmetry-of-information theorem: K(M_2) = K(M_1,P) + O(\log n)”

Shportko, Proof of Theorem 3 · Section 3, Step 3

“For any lossless compressor C and string x: K(x) \leq |C(x)| + c_C”

Shportko, Section 4.1 · Section 4.1

Main concerns

The experimental validation tests only one specific steganographic method (3-bit color codes per sentence using synesthetic associations), leaving generalizability to other embedding schemes entirely unverified. Assumption A1 (model quality) posits that $|\frac{K(x)}{T} - H_{\mathcal{M}_1}(x)| \leq \varepsilon$, which is uncheckable since $K(x)$ is uncomputable; the error bounds could be substantial in practice. The semantic-load definition (Definition 1) assumes $K(M_i|S) = O(1)$, which idealizes the fuzzy notion of natural-language equivalence and ignores that different surface forms of the 'same' meaning can have varying Kolmogorov complexity.

“The scheme embeds 3-bit color codes per sentence. The binary payload P is segmented into 3-bit chunks, each mapped to one of 8 colors (red, yellow, green, blue, purple, brown, black, white)”

Shportko, Section 5.1 · Section 5.1

“The base language model $\mathcal{M}_1$ is $\varepsilon$-optimal for the source in the sense that for every string x of length T, |H_{\mathcal{M}_1}(x) - \frac{K(x)}{T}| \leq \varepsilon”

Shportko, Assumption A1 · Section 4.4

“Two messages M_1 and M_2 have identical semantic load if there exists a finite binary string S such that K(M_1|S) = O(1) and K(M_2|S) = O(1)”

Shportko, Definition 1 · Section 3

Evidence and comparison

The paper fairly distinguishes its contribution from prior work: Cachin (2004) focused on statistical security (KL divergence) while this work addresses Kolmogorov complexity; Ryabko & Ryabko (2009) bounded encoder complexity while this bounds output complexity. The comparison to Roger & Greenblatt (2023) on AI-to-AI covert channels is appropriate. However, the claim that Binoculars detects the predicted complexity increase relies on a single significant test ($p < 10^{-6}$) with moderate effect size (mean shift 0.310), and the paper acknowledges that 'single-threshold detection would have non-trivial error rates'.

“If the payload has non-trivial complexity such that K(P) \gg \log n, then K(M_2) > K(M_1)”

Shportko, Corollary 4 · Section 3

“The Binoculars shift, while statistically significant, is moderate in absolute terms (mean 0.310), and single-threshold detection would have non-trivial error rates”

Shportko, Section 6 · Section 6 (Limitations)

“We compute the log perplexity of the text in question using an 'observer' LLM. We then compute next-token predictions using a 'performer' LLM and compute their perplexity according to the observer. We call this metric cross-perplexity.”

Hans et al., 2024 · Section 3

Reproducibility

Reproducibility is significantly impaired. The encoding and decoding uses a proprietary model ('claude-sonnet-4-20250514') that is not publicly available or versioned. No code, dataset, or hyperparameters are provided—the paper merely states that 300 entries were constructed from 'Three-Body Problem' excerpts without releasing the corpus or the specific prompts used for the LLM rewriter. The Binoculars scores depend on Falcon-7B and Falcon-7B-Instruct, which are accessible, but the exact preprocessing and tokenization details necessary to replicate the scores are omitted.

“All encoding uses claude-sonnet-4-20250514”

Shportko, Section 5.2 · Section 5.2

“We construct 300 entries by independently sampling N \in {1,\ldots,6} sentences from a corpus of Three-Body Problem excerpts”

Shportko, Section 5.1 · Section 5.1

Abstract

Large language models can rewrite text to embed hidden payloads while preserving surface-level meaning, a capability that opens covert channels between cooperating AI systems and poses challenges for alignment monitoring. We study the information-theoretic cost of such embedding. Our main result is that any steganographic scheme that preserves the semantic load of a covertext~$M_1$ while encoding a payload~$P$ into a stegotext~$M_2$ must satisfy $K(M_2) \geq K(M_1) + K(P) - O(\log n)$, where $K$ denotes Kolmogorov complexity and $n$ is the combined message length. A corollary is that any non-trivial payload forces a strict complexity increase in the stegotext, regardless of how cleverly the encoder distributes the signal. Because Kolmogorov complexity is uncomputable, we ask whether practical proxies can detect this predicted increase. Drawing on the classical correspondence between lossless compression and Kolmogorov complexity, we argue that language-model perplexity occupies an analogous role in the probabilistic regime and propose the Binoculars perplexity-ratio score as one such proxy. Preliminary experiments with a color-based LLM steganographic scheme support the theoretical prediction: a paired $t$-test over 300 samples yields $t = 5.11$, $p < 10^{-6}$.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.