Multinoulli Extension: A Lossless Continuous Relaxation for Partition-Constrained Subset Selection

cs.LG math.OC Qixin Zhang, Wei Huang, Yan Sun, Yao Shu, Yi Yu, Dacheng Tao · Mar 23, 2026
Local to this browser
What it does
The paper tackles partition-constrained subset selection for 'close-to-submodular' objectives—specifically α-weakly DR-submodular and (γ,β)-weakly submodular functions—where existing distorted local-search methods suffer from prohibitive...
Why it matters
The authors propose the Multinoulli Extension (ME), a continuous relaxation that learns multinoulli priors for each partition block, enabling lossless rounding without submodularity assumptions. They develop offline (Multinoulli-SCG) and...
Main concern
The paper presents a novel and theoretically rigorous continuous relaxation framework. The Multinoulli Extension is well-motivated with solid proofs of differentiability and inheritance of weak submodularity properties.
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

The paper tackles partition-constrained subset selection for 'close-to-submodular' objectives—specifically α-weakly DR-submodular and (γ,β)-weakly submodular functions—where existing distorted local-search methods suffer from prohibitive query complexity (˜O(1/ϵ^6)) and require prior knowledge of structural parameters. The authors propose the Multinoulli Extension (ME), a continuous relaxation that learns multinoulli priors for each partition block, enabling lossless rounding without submodularity assumptions. They develop offline (Multinoulli-SCG) and online (Multinoulli-OSCG/OSGA) algorithms achieving tight approximation guarantees with O(1/ϵ^2) query complexity and O(√T) regret, respectively.

Critical review
Verdict
Bottom line

The paper presents a novel and theoretically rigorous continuous relaxation framework. The Multinoulli Extension is well-motivated with solid proofs of differentiability and inheritance of weak submodularity properties. The parameter-free algorithms achieve significant improvements in query complexity over prior distorted local-search methods, and the online extensions represent technical advances. However, practical impact is tempered by reliance on high-variance stochastic gradient estimators and the need for non-oblivious auxiliary functions to achieve optimal approximation ratios, adding implementation complexity.

“Distorted local search generally requires prior knowledge of specific structural parameters... result in an extremely high ˜O(1/ϵ^6) and ˜O(1/ϵ^3) number of value queries”
paper · Section I
“a notable advantage of our ME is its intrinsic capacity to provide a lossless rounding scheme for any set function”
paper · Section IV
What holds up

The Multinoulli Extension is rigorously developed: Theorem 1 establishes preservation of differentiability, monotonicity, and weak DR-submodularity (α-weakly), while Theorem 2 provides the crucial link between continuous and discrete objectives. The 'lossless rounding' claim is substantiated—Algorithm 1 ensures |S ∩ Vk| ≡ Bk and E[f(S)] ≥ F(x), avoiding submodularity requirements of multi-linear extension schemes (Section IV-C, Theorem 5). The query complexity improvement to O(r^3 n^2/ϵ^2) (Table I) is significant and supported by path-integrated differential estimators.

“If f is α-weakly DR-submodular, then F is α-weakly continuous DR-submodular... If f is γ-weakly submodular from below, then F is upper-linearizable”
paper · Theorem 1
“When the set function f is monotone, for any multinoulli priors... the subset S yielded by Algorithm 1 satisfies... E(f(S)) ≥ F(p_1,...,p_K)”
paper · Theorem 5
“Multinoulli-SCG... O(r^3 n^2/ϵ^2)... (1-e^{-α})OPT-ϵ”
paper · Table I
Main concerns

First, while query complexity is reduced, stochastic estimation of gradients (Remark 3) and Hessians (Remark 8) introduces variance that may affect practical convergence; experiments lack variance reporting. Second, achieving tight (1-e^{-α}) approximation requires non-oblivious auxiliary functions (Theorem 4), adding overhead versus standard gradient ascent. Third, Multinoulli-OSCG requires maintaining Q=O(√T) online linear oracles (Algorithm 3), creating memory overhead, though OSGA mitigates this. Finally, the 'parameter-free' claim is weakened for Multinoulli-OSGA with auxiliary estimation, which requires knowing α, β, γ to set weight function w(z) (Table II footnote †), undermining the advantage in online settings.

“we can estimate ∂^2F/∂p_{m_1k_1}∂p_{m_2k_2} as... B_{k_1}B_{k_2}(f(v_{m_1k_1} ∪ S ∪ {v_{m_2k_2}}) - f(v_{m_1k_1} ∪ S))”
paper · Remark 8
“Algorithm 4 with auxiliary gradient estimation... Para-free? ✘”
paper · Table II footnote †
“Initialize: Q online oracles {E(1),...,E(Q)}... for q = 1,...,Q do”
paper · Algorithm 3
Evidence and comparison

Tables I and II provide comprehensive comparisons positioning Multinoulli-SCG favorably against distorted local-search in query complexity. However, the paper lacks comparison with other recent continuous relaxations for non-submodular functions. Experiments on video summarization, coverage, and Bayesian A-optimal design (Section VII) demonstrate competitive utility with fewer queries, but offer limited statistical analysis (no confidence intervals) and omit hyperparameter sensitivity analysis (batch size L, step size η). The tightness claim for stationary points (Remark 7) is supported by a constructed example in Appendix I showing 1/2-approximation is tight for submodular cases.

“Multinoulli-SCG... 8.59... Distorted-LS-G... 10.40 (queries log10 scale)”
paper · Table III
“the point 1_{[n]} is a local (1/2)-approximation stationary point of the ME of the cover function”
paper · Appendix I
“Multinoulli-SCG achieves Top-2 performance in all 8 videos”
paper · Section VII-A
Reproducibility

Detailed algorithmic descriptions and proofs are provided in appendices, aiding reproducibility. However, no code repository is mentioned. Stochastic gradient and Hessian estimation require careful sampling implementation (Remarks 3, 8), and projections use CVX (Appendix A) which may not scale. The offline algorithms require T=O(√(nr)/ϵ) iterations with batch size L=O(T), implying substantial computational cost per iteration despite reduced total queries. Critical hyperparameters (step sizes, batch sizes) are specified in Appendix A, but sensitivity to these choices is not analyzed in the main experiments.

“As for the projection of Step 14 in Algorithm 4, we utilize the CVX optimization tool”
paper · Appendix A
“it is sufficient to estimate up to O(nr) second-order partial derivatives for each point”
paper · Remark 11
“if we set the batch size L=O(T)... subset S output by Algorithm 2 satisfies: E[f(S)] ≥ (1-e^{-α})f(S^*) - O(r√n/T)”
paper · Theorem 7
Abstract

Identifying the most representative subset for a close-to-submodular objective while satisfying the predefined partition constraint is a fundamental task with numerous applications in machine learning. However, the existing distorted local-search methods are often hindered by their prohibitive query complexities and the rigid requirement for prior knowledge of difficult-to-obtain structural parameters. To overcome these limitations, we introduce a novel algorithm titled Multinoulli-SCG, which not only is parameter-free, but also can achieve the same approximation guarantees as the distorted local-search methods with significantly fewer function evaluations. More specifically, when the objective function is monotone $\alpha$-weakly DR-submodular or $(\gamma,\beta)$-weakly submodular, our Multinoulli-SCG algorithm can attain a value of $(1-e^{-\alpha})\text{OPT}-\epsilon$ or $(\frac{\gamma^{2}(1-e^{-(\beta(1-\gamma)+\gamma^2)})}{\beta(1-\gamma)+\gamma^2})\text{OPT}-\epsilon$ with only $O(1/\epsilon^{2})$ function evaluations, where OPT denotes the optimal value. The cornerstone of our Multinoulli-SCG algorithm is an innovative continuous-relaxation framework named Multinoulli Extension(ME), which can effectively convert the discrete subset selection problem subject to partition constraints into a solvable continuous maximization focused on learning the optimal multinoulli priors across the concerned partition. In sharp contrast with the well-established multi-linear extension for submodular subset selection, a notable advantage of our proposed ME is its intrinsic capacity to provide a lossless rounding scheme for any set function. Furthermore, based on our proposed ME, we also present two novel online algorithms, namely, Multinoulli-OSCG and Multinoulli-OSGA, for the unexplored online subset selection problems over partition constraints.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.