PLR: Plackett-Luce for Reordering In-Context Learning Examples
The paper addresses the brittleness of in-context learning (ICL) to example ordering, an intractable $n!$ search problem. It proposes PLR, which reframes discrete permutation search as learning a Plackett-Luce distribution that concentrates probability mass on high-performing orderings. Using Gumbel perturb-and-sort for efficient sampling, PLR optimizes task-level metrics directly without requiring finite label spaces, extending naturally to open-ended reasoning tasks like mathematical problem solving.
PLR offers a conceptually elegant probabilistic alternative to heuristic ordering methods. By treating ordering selection as distribution learning rather than point estimation, the method naturally explores multiple modes of good orderings and directly optimizes task-level metrics. The empirical results demonstrate consistent improvements over strong baselines across classification and reasoning benchmarks, though the gains are most pronounced when the number of examples $k \geq 8$ where the permutation space becomes prohibitively large for naive search.
The strongest aspect is the principled framing of ordering as learning a distribution $q_\phi(\pi)$ over permutations to maximize expected task performance $\mathbb{E}_{\pi \sim q_\phi}[f(D, p \oplus E_\pi)]$. The use of the Gumbel trick for efficient sampling from the Plackett-Luce model is computationally sound, and the extension to mixture-of-PL distributions (with theoretical density guarantees in Theorem 1) provides useful expressivity. The method's label-space agnosticism is a genuine advantage, demonstrated by strong results on mathematical reasoning benchmarks GSM8K and MATH500 where entropy-based methods like LocalE/GlobalE cannot apply.
The empirical gains, while consistent, are often marginal—especially for small $k$ where PLR barely outperforms the Top-K baseline (equivalent to random sampling with the same budget). The method requires labeled data for the training split to compute task metrics, limiting applicability in unsupervised settings. Additionally, PLR optimizes separately for each task without cross-task transfer, and the paper does not characterize the computational overhead of iterative sampling compared to single-pass heuristics. The mixture model ablation (Table 3) also reveals overfitting risks with larger $K$, suggesting limited practical benefit beyond $K=4$.
Comparisons to relevant work are generally fair and comprehensive, covering entropy-based methods (LocalE/GlobalE), label-distribution matching (PDO variants), and dataset-free filtering (DEmO). The evaluation protocol controls for example selection by using identical demonstration sets across methods, isolating the ordering effect. However, the paper does not report statistical significance tests for the accuracy differences in Tables 1 and 2, making it difficult to assess whether the observed gains (often less than 1-2%) are robust or within margin of error. The reasoning experiments on GSM8K and MATH500 provide valuable evidence for the method's applicability beyond classification.
The paper provides substantial reproducibility support: code is available at a public GitHub repository, and Appendix B details hyperparameters including iteration counts ($T=15$), samples per iteration ($B=15$), elite fraction ($\rho=0.2$), and learning rates. However, the paper does not specify whether experiments were run via API or local inference, nor does it report wall-clock time or total compute cost (e.g., GPU hours), which are necessary for assessing practical feasibility. The Gumbel sampling procedure is deterministic given random seeds, but the paper does not clarify whether reported seeds cover data splits, model initialization, or sampling noise.
In-context learning (ICL) adapts large language models by conditioning on a small set of ICL examples, avoiding costly parameter updates. Among other factors, performance is often highly sensitive to the ordering of the examples. However, exhaustive search over the $n!$ possible orderings is infeasible. Therefore more efficient ordering methods use model confidence measures (e.g., label-probability entropy) over label sets or take a direct approach to finding the best ordering. We propose PLR, a probabilistic approach to in-context example ordering that replaces discrete ordering search with learning a probability distribution over orderings with the Plackett-Luce model. PLR models orderings using a Plackett-Luce distribution and iteratively updates its parameters to concentrate probability mass on high-performing orderings under a task-level metric. Candidate orderings are sampled efficiently via a Gumbel perturb-and-sort procedure. Experiments on multiple classification benchmarks show that PLR consistently improves few-shot accuracy for $k \in \{4, 8, 16, 32\}$ examples, and we further demonstrate gains on mathematical reasoning tasks where label-based ordering methods are not applicable. Our code is available at https://github.com/Batorskq/PLR.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.