CRPS-Optimal Binning for Conformal Regression

cs.LG stat.ML Paolo Toccaceli · Mar 23, 2026
Local to this browser
What it does
This paper addresses conditional distribution estimation for regression by proposing a non-parametric binning approach. Observations sorted by a one-dimensional covariate are partitioned into contiguous bins via dynamic programming,...
Why it matters
Observations sorted by a one-dimensional covariate are partitioned into contiguous bins via dynamic programming, minimizing a closed-form leave-one-out CRPS cost function. The method produces conformal prediction sets with finite-sample...
Main concern
The paper presents a technically sound synthesis of optimal binning and conformal prediction. The derivation of the LOO-CRPS cost function $cost(S) = \frac{m}{(m-1)^2} \sum_{\ell < r} |y_\ell - y_r|$ enables exact global optimization via...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

This paper addresses conditional distribution estimation for regression by proposing a non-parametric binning approach. Observations sorted by a one-dimensional covariate are partitioned into contiguous bins via dynamic programming, minimizing a closed-form leave-one-out CRPS cost function. The method produces conformal prediction sets with finite-sample marginal coverage guarantees and connects to Venn predictors, offering substantially narrower intervals than standard split-conformal methods on heteroscedastic and bimodal benchmarks.

Critical review
Verdict
Bottom line

The paper presents a technically sound synthesis of optimal binning and conformal prediction. The derivation of the LOO-CRPS cost function $cost(S) = \frac{m}{(m-1)^2} \sum_{\ell < r} |y_\ell - y_r|$ enables exact global optimization via dynamic programming in $O(n^2 K)$ time, avoiding the local optima of greedy segmentation. The cross-validated selection of $K$ via alternating splits correctly addresses in-sample optimism. While the coverage guarantee (Proposition 2) holds under exact exchangeability, the authors transparently acknowledge that data-dependent binning introduces approximation error. The empirical evaluation on Old Faithful and motorcycle accident datasets demonstrates genuine efficiency gains over CQR and quantile regression forests.

“cost(S) = \frac{m}{(m-1)^2} \sum_{\ell < r, \ell,r \in S} |y_\ell - y_r|”
paper · Proposition 1
“In the present method the partition is data-dependent and... membership exchangeability within each bin is approximate rather than exact.”
paper · Section 8.2
What holds up

The closed-form cost derivation (Proposition 1) is elegant and correct, reducing the leave-one-out CRPS to a scalar function of pairwise absolute differences. The dynamic programming approach guarantees global optimality via the optimal substructure property: "the recurrence propagates exact optima at each step, so no feasible partition is overlooked" (Section 11.1). The connection to Venn predictors provides theoretical grounding for the within-bin empirical CDF approach. Empirically, the method achieves 11-40% narrower prediction intervals than competitors while maintaining near-nominal coverage.

“The recurrence propagates exact optima at each step, so no feasible partition is overlooked.”
paper · Section 11.1
“Our method (full n): 90.3±0.1% coverage, 1.200±0.001 mean width vs Gaussian split conformal: 91.2±0.0%, 1.683±0.006”
paper · Table 3
Main concerns

The primary theoretical limitation is the approximate nature of the coverage guarantee due to data-dependent bins. The DP optimization uses all $y$-values to locate boundaries, violating the exchangeability assumption required for exact finite-sample coverage. This manifests as slight under-coverage (87.0% vs 90% nominal) when restricted to $n/2$ samples. The method is also restricted to one-dimensional covariates; extensions to higher dimensions via greedy splitting lose the global optimality guarantee: "The DP tractability... is entirely one-dimensional" (Section 12). Additionally, the $O(n^2 \log n)$ precomputation may limit scalability. The small-sample behavior of the cross-validation criterion can select $K$ implying bins with few observations, leading to coarse intervals: "In small datasets the variance of TestCRPS$(K)$ is high" (Section 7.1).

“In the present method the partition is data-dependent and, more specifically, the DP uses all y-values to locate boundaries, so membership exchangeability within each bin is approximate rather than exact.”
paper · Section 8.2
“The method requires a one-dimensional covariate and a contiguous binning structure.”
paper · Section 12
“In small datasets the variance of TestCRPS(K) is high, and the selected K* may still imply bins with few observations”
paper · Section 7.1
Evidence and comparison

Comparisons to Gaussian split conformal, CQR (cubic), and CQR-QRF on Old Faithful and motorcycle benchmarks are fair and reveal genuine efficiency improvements. On Old Faithful, the full-data method achieves 1.20 min mean width versus 1.68 min for Gaussian split conformal. The authors responsibly report matched-sample comparisons: "Applying our method to only the training half... gives slightly sub-nominal coverage (88.3±0.3%)" (Section 10.1). The diagnosis of in-sample optimism for within-sample LOO-CRPS is validated by the monotone decreasing curve in Figure 2, contrasting with the U-shaped test CRPS curve used for proper model selection.

“Applying our method to only the training half (italic row; matched sample size as the competitors) gives slightly sub-nominal coverage (88.3±0.3%) at a modest width cost (1.26 min)”
paper · Section 10.1
“The within-sample criterion is nearly monotone decreasing, confirming its susceptibility to in-sample optimism. The test CRPS has a clear U-shape with minimum at K*=5.”
paper · Figure 2
Reproducibility

Reproducibility is strong. Algorithm 1 provides detailed pseudocode for the DP including Fenwick tree updates. The Python package crpsconfreg is available on PyPI and GitHub with scripts to reproduce all figures. Hyperparameters are explicitly stated: $K_{\max} = \lfloor n/10 \rfloor$ and minimum bin size $m_{\min}=2$. The method is deterministic given the data. The preprocessing for $O(\log n)$ updates is described in sufficient detail for independent implementation. The experimental protocol uses standard datasets (faithful, mcycle) with 200 random splits for statistics.

“Phase 1: Precompute cost matrix... Insert $y_j$ into $T_{\mathrm{cnt}}$ and $T_{\mathrm{sum}}$...”
paper · Algorithm 1
“We set $K_{\max} = \lfloor n/10 \rfloor$ throughout, ensuring the minimum expected bin size on the training half is at least 10”
paper · Section 7.1
Abstract

We propose a method for non-parametric conditional distribution estimation based on partitioning covariate-sorted observations into contiguous bins and using the within-bin empirical CDF as the predictive distribution. Bin boundaries are chosen to minimise the total leave-one-out Continuous Ranked Probability Score (LOO-CRPS), which admits a closed-form cost function with $O(n^2 \log n)$ precomputation and $O(n^2)$ storage; the globally optimal $K$-partition is recovered by a dynamic programme in $O(n^2 K)$ time. Minimisation of Within-sample LOO-CRPS turns out to be inappropriate for selecting $K$ as it results in in-sample optimism. So we instead select $K$ by evaluating test CRPS on an alternating held-out split, which yields a U-shaped criterion with a well-defined minimum. Having selected $K^*$ and fitted the full-data partition, we form two complementary predictive objects: the Venn prediction band and a conformal prediction set based on CRPS as the nonconformity score, which carries a finite-sample marginal coverage guarantee at any prescribed level $\varepsilon$. On real benchmarks against split-conformal competitors (Gaussian split conformal, CQR, and CQR-QRF), the method produces substantially narrower prediction intervals while maintaining near-nominal coverage.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.