RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

cs.LG math.OC Abolfazl Hashemi · Mar 23, 2026
Local to this browser
What it does
RAMPAGE addresses discretization bias in Extragradient (EG) methods for variational inequalities by replacing the deterministic midpoint with randomized sampling. The core idea uses uniform sampling to construct an unbiased estimator of...
Why it matters
The core idea uses uniform sampling to construct an unbiased estimator of the continuous-time flow integral, while RAMPAGE+ leverages antithetic variates to eliminate first-order variance terms. This matters for training GANs and other...
Main concern
The paper offers a novel theoretical perspective on debiasing EG through stochastic integration. The analysis showing that antithetic sampling yields deterministic convergence bounds for RAMPAGE+ (despite randomization) is elegant and...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

RAMPAGE addresses discretization bias in Extragradient (EG) methods for variational inequalities by replacing the deterministic midpoint with randomized sampling. The core idea uses uniform sampling to construct an unbiased estimator of the continuous-time flow integral, while RAMPAGE+ leverages antithetic variates to eliminate first-order variance terms. This matters for training GANs and other non-conservative games where EG's $\mathcal{O}(\eta^2)$ bias causes divergence in highly nonlinear regimes.

Critical review
Verdict
Bottom line

The paper offers a novel theoretical perspective on debiasing EG through stochastic integration. The analysis showing that antithetic sampling yields deterministic convergence bounds for RAMPAGE+ (despite randomization) is elegant and technically sound. However, the practical impact is limited by synthetic experiments only, no code release, and the fact that RAMPAGE+ requires two gradient evaluations per iteration—matching EG's cost without demonstrating clear computational advantages on real problems.

“min_{0\leq l\leq k}\|F(\theta_{l})\|^{2}\leq\frac{\|\theta_{0}-\theta^{*}\|^{2}}{C_{RA+}(k+1)}”
Hashemi, Sec. 5 · Theorem 5
What holds up

The bias analysis of EG is rigorous: Appendix A correctly derives the bias $\text{Bias}=-\frac{1}{6}\eta^{2}H_{t}[F_{t},F_{t}]$ and shows RAMPAGE is unbiased with variance $\frac{1}{3}\eta^{2}\|J_{t}F_{t}\|^{2}+\mathcal{O}(\eta^{4})$, while RAMPAGE+ reduces variance to $\frac{1}{45}\eta^{4}\|H_{t}[F_{t},F_{t}]\|^{2}$. The convergence proofs under co-coercivity, co-hypomonotonicity, and $\alpha$-symmetric $(L_0,L_1)$-Lipschitzness are comprehensive, and the symmetric scaling workaround for constrained VIs is clever.

“EG: Bias=-\frac{1}{6}\eta^{2}H_{t}[F_{t},F_{t}], Var=0 ... RAMPAGE+: Bias=0, Var=\frac{1}{45}\eta^{4}\|H_{t}[F_{t},F_{t}]\|^{2}”
Hashemi, Appendix A · Proposition 2
Main concerns

First, computational cost: RAMPAGE+ requires two gradient evaluations per iteration (like EG), yet the paper does not demonstrate wall-clock improvements over EG or other stabilized methods (e.g., Optimistic Gradient). Second, step size restrictions: In the co-coercive setting, RAMPAGE+ permits $\eta\leq 1/(\sqrt{3}L)$ (Theorem 5) versus RAMPAGE's $\eta\leq 1/(\sqrt{2}L)$ (Theorem 1), suggesting the variance reduction comes at the price of smaller step sizes. Third, inconsistency in constrained settings: The symmetrically-scaled SS-RAMPAGE+ differs from RAMPAGE+ when $\mathcal{X}=\mathbb{R}^{p}$, which the paper acknowledges as unresolved. Finally, the experiments are limited to low-dimensional synthetic problems (10D, 20D, 2D) without validation on actual GAN training despite the NTK motivation in Section 7.

“Note that when u_t=1/2 we recover the constrained version of (EG), but the method differs from (RAMPAGE) when $\mathcal{X}=\mathbb{R}^{p}$. We leave rectifying this as a future work.”
Evidence and comparison

The theoretical evidence is strong: multiple theorems establish $\mathcal{O}(1/k)$ rates under progressively weaker assumptions (co-coercivity $\rightarrow$ co-hypomonotonicity $\rightarrow$ generalized Lipschitz). The claim that RAMPAGE+ achieves deterministic bounds while RAMPAGE only guarantees expected convergence is substantiated by comparing Theorem 1 (expected norm) versus Theorem 5 (deterministic norm). However, the numerical verification compares only against EG and does not include other accelerated or stabilized methods such as Halpern iteration or EAG, which also achieve $\mathcal{O}(1/k)$ rates. The NTK argument linking high-frequency components to GAN training (Section 7) is speculative without empirical validation on neural networks.

“A key distinction lies in the realization dependence of the trajectory. The bounds derived for (RAMPAGE) are valid in expectation. Conversely, by perfectly correlating the antithetic exploration samples, (RAMPAGE+) removes the variance completely.”
Hashemi, Sec. 5 · Remark 2
Reproducibility

No code repository or implementation details are provided, limiting reproducibility. While hyperparameters (step sizes, problem dimensions) are specified in Section 7, random seeds and exact initialization distributions are not detailed. The proofs in Appendices C-E appear complete but rely on standard techniques from variational inequality literature; independent verification would require substantial effort. The synthetic test functions (4th-order polynomial, high-frequency sinusoidal games) are reproducible from equations (36)-(40), though the claim that these model 'infinite-width GANs' is not experimentally validated.

“We employ the same initialization and a constant step size $\eta$ across all three methods... This step size is explicitly selected to lie precisely at the instability threshold where (EG) diverges.”
Hashemi, Sec. 7 · Experimental Setup
Abstract

A celebrated method for Variational Inequalities (VIs) is Extragradient (EG), which can be viewed as a standard discrete-time integration scheme. With this view in mind, in this paper we show that EG may suffer from discretization bias when applied to non-linear vector fields, conservative or otherwise. To resolve this discretization shortcoming, we introduce RAndomized Mid-Point for debiAsed Gradient Extrapolation (RAMPAGE) and its variance-reduced counterpart, RAMPAGE+ which leverages antithetic sampling. In contrast with EG, both methods are unbiased. Furthermore, leveraging negative correlation, RAMPAGE+ acts as an unbiased, geometric path-integrator that completely removes internal first-order terms from the variance, provably improving upon RAMPAGE. We further demonstrate that both methods enjoy provable $\mathcal{O}(1/k)$ convergence guarantees for a range of problems including root finding under co-coercive, co-hypomonotone, and generalized Lipschitzness regimes. Furthermore, we introduce symmetrically scaled variants to extend our results to constrained VIs. Finally, we provide convergence guarantees of both methods for stochastic and deterministic smooth convex-concave games. Somewhat interestingly, despite being a randomized method, RAMPAGE+ attains purely deterministic bounds for a number of the studied settings.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.