PLR: Plackett-Luce for Reordering In-Context Learning Examples

cs.LG cs.CL Pawel Batorski, Paul Swoboda · Mar 22, 2026

What it does

The paper addresses the brittleness of in-context learning (ICL) to example ordering, an intractable $n! $ search problem.

Why it matters

It proposes PLR, which reframes discrete permutation search as learning a Plackett-Luce distribution that concentrates probability mass on high-performing orderings. Using Gumbel perturb-and-sort for efficient sampling, PLR optimizes...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

The paper addresses the brittleness of in-context learning (ICL) to example ordering, an intractable $n!$ search problem. It proposes PLR, which reframes discrete permutation search as learning a Plackett-Luce distribution that concentrates probability mass on high-performing orderings. Using Gumbel perturb-and-sort for efficient sampling, PLR optimizes task-level metrics directly without requiring finite label spaces, extending naturally to open-ended reasoning tasks like mathematical problem solving.

Critical review

Verdict

Bottom line

PLR offers a conceptually elegant probabilistic alternative to heuristic ordering methods. By treating ordering selection as distribution learning rather than point estimation, the method naturally explores multiple modes of good orderings and directly optimizes task-level metrics. The empirical results demonstrate consistent improvements over strong baselines across classification and reasoning benchmarks, though the gains are most pronounced when the number of examples $k \geq 8$ where the permutation space becomes prohibitively large for naive search.

“We propose PLR, a probabilistic approach to in-context example ordering that replaces discrete ordering search with learning a probability distribution over orderings with the Plackett–Luce model.”

Batorski & Swoboda, Abstract · Abstract

“We note that the difference to Top-K is small when few (e.g. 4) ICL examples are present, since then the number of permutations is small. We get larger performance differences once the space of permutations cannot be sampled effectively anymore, e.g. for $k\geq 8$.”

Batorski & Swoboda, Sec. 4.2 · Section 4.2

What holds up

The strongest aspect is the principled framing of ordering as learning a distribution $q_\phi(\pi)$ over permutations to maximize expected task performance $\mathbb{E}_{\pi \sim q_\phi}[f(D, p \oplus E_\pi)]$. The use of the Gumbel trick for efficient sampling from the Plackett-Luce model is computationally sound, and the extension to mixture-of-PL distributions (with theoretical density guarantees in Theorem 1) provides useful expressivity. The method's label-space agnosticism is a genuine advantage, demonstrated by strong results on mathematical reasoning benchmarks GSM8K and MATH500 where entropy-based methods like LocalE/GlobalE cannot apply.

“We will formulate finding the optimal ordering as fitting a distribution over permutations such that high quality permutations are given high probability mass.”

Batorski & Swoboda, Sec. 3.1 · Section 3.1

“Because these tasks have effectively unbounded answer spaces, label-probability based ordering methods such as LocalE/GlobalE, PDO, and DEmO are not directly applicable, so we compare against the Static and Top-K baselines.”

Batorski & Swoboda, Sec. 4.2 · Section 4.2

Main concerns

The empirical gains, while consistent, are often marginal—especially for small $k$ where PLR barely outperforms the Top-K baseline (equivalent to random sampling with the same budget). The method requires labeled data for the training split to compute task metrics, limiting applicability in unsupervised settings. Additionally, PLR optimizes separately for each task without cross-task transfer, and the paper does not characterize the computational overhead of iterative sampling compared to single-pass heuristics. The mixture model ablation (Table 3) also reveals overfitting risks with larger $K$, suggesting limited practical benefit beyond $K=4$.

“PLR optimizes demonstration orderings using task-level metrics (e.g., accuracy), which requires labeled data to reliably score candidate permutations.”

Batorski & Swoboda, Sec. 6 · Section 6 (Limitations)

“Overall, performance is largely stable across $K$, with only modest fluctuations... We also see that potential overfitting occurs for larger $K$.”

Batorski & Swoboda, Sec. 4.3 · Section 4.3, Table 3

Evidence and comparison

Comparisons to relevant work are generally fair and comprehensive, covering entropy-based methods (LocalE/GlobalE), label-distribution matching (PDO variants), and dataset-free filtering (DEmO). The evaluation protocol controls for example selection by using identical demonstration sets across methods, isolating the ordering effect. However, the paper does not report statistical significance tests for the accuracy differences in Tables 1 and 2, making it difficult to assess whether the observed gains (often less than 1-2%) are robust or within margin of error. The reasoning experiments on GSM8K and MATH500 provide valuable evidence for the method's applicability beyond classification.

“This strategy prevents unfair comparisons that could result from using different sets of examples.”

Batorski & Swoboda, Sec. 4 · Section 4

“Test accuracy (%) averaged over seeds.”

Batorski & Swoboda, Table 1 · Table 1 caption

Reproducibility

The paper provides substantial reproducibility support: code is available at a public GitHub repository, and Appendix B details hyperparameters including iteration counts ($T=15$), samples per iteration ($B=15$), elite fraction ($\rho=0.2$), and learning rates. However, the paper does not specify whether experiments were run via API or local inference, nor does it report wall-clock time or total compute cost (e.g., GPU hours), which are necessary for assessing practical feasibility. The Gumbel sampling procedure is deterministic given random seeds, but the paper does not clarify whether reported seeds cover data splits, model initialization, or sampling noise.

“Sample $B$ permutations via Gumbel perturb-and-sort...”

Batorski & Swoboda, Alg. 1 · Algorithm 1

“Hyperparameters used for PLR... $T$ (CE iterations)... $B$ (samples/iter)... $\rho$ (elite fraction)...”

Batorski & Swoboda, App. B · Appendix B

Abstract

In-context learning (ICL) adapts large language models by conditioning on a small set of ICL examples, avoiding costly parameter updates. Among other factors, performance is often highly sensitive to the ordering of the examples. However, exhaustive search over the $n!$ possible orderings is infeasible. Therefore more efficient ordering methods use model confidence measures (e.g., label-probability entropy) over label sets or take a direct approach to finding the best ordering. We propose PLR, a probabilistic approach to in-context example ordering that replaces discrete ordering search with learning a probability distribution over orderings with the Plackett-Luce model. PLR models orderings using a Plackett-Luce distribution and iteratively updates its parameters to concentrate probability mass on high-performing orderings under a task-level metric. Candidate orderings are sampled efficiently via a Gumbel perturb-and-sort procedure. Experiments on multiple classification benchmarks show that PLR consistently improves few-shot accuracy for $k \in \{4, 8, 16, 32\}$ examples, and we further demonstrate gains on mathematical reasoning tasks where label-based ordering methods are not applicable. Our code is available at https://github.com/Batorskq/PLR.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.