Constrained Online Convex Optimization with Memory and Predictions
The paper tackles Constrained Online Convex Optimization with Memory (COCO-M), where both losses and constraints depend on a window of past decisions, capturing realistic scenarios like smart-grid budgets and battery health limits. The authors propose the first algorithms achieving sublinear regret and cumulative constraint violation (CCV) under adversarial, time-varying constraints, both with and without unreliable predictions of future gradients. This work bridges the gap between classical constrained OCO and practical memory-dependent control problems.
The paper introduces the first algorithms for COCO-M and COCO-M² (memory in both losses and constraints) achieving sublinear regret and CCV, tightening prior bounds from $\mathcal{O}(T^{2/3}\log^2 T)$ to $\mathcal{O}(m^{3/2}\sqrt{T\log T})$ regret for the full memory case. However, the optimistic results with predictions rely on strong separability and linearity assumptions (Assumptions 4-5), and the benchmark set $\mathcal{X}_T^{mp}$ is strictly more restrictive than the standard constrained benchmark set $\mathcal{X}_T^m$.
The penalty method with adaptive quadratic penalty $\Phi(V) = \lambda V^2$ elegantly handles the coupling between constraint violation and objective, and the reduction from memory-dependent to memory-less problems via Lipschitz continuity is technically sound. The reinterpretation of the memory problem as delayed feedback through the forward function $Z_t(x_t)$ is a clever reformulation that enables application of optimistic delayed AdaFTRL while accommodating unreliable predictions.
The optimistic algorithm requires restrictive assumptions: Assumption 4 (Separability) mandates that memory functions decompose into per-round components $f_t(x_{t-m}^t) = \sum_{i=0}^m f_t^i(x_{t-i})$, and Assumption 5 (Linearity) requires these components to be linear, excluding general convex losses. The benchmark $\mathcal{X}_T^{mp} = \{x \in \mathcal{X}: g_t^i(x) \leq 0, \forall t, i \leq m\}$ is more restrictive than $\mathcal{X}_T^m$ as it requires satisfying all decomposed constraint components individually rather than the aggregate constraints. The comparison to Liu et al. (2023) is limited: their $\mathcal{O}(T^{2/3}\log^2 T)$ bound applies when $m = \log T$, while this paper's improvement to $\mathcal{O}(m^{3/2}\sqrt{T\log T})$ holds for arbitrary $m$, making the regimes not fully comparable.
The evidence supports the claims for the no-prediction setting, with explicit bounds provided in Theorems 1 and 2 that improve upon the $\mathcal{O}(T^{2/3}\log^2 T)$ rates of Liu et al. (2023) for COCO-M² when $T \in [3, 10^{49}]$. The paper acknowledges that when $m=0$, its bounds ($\mathcal{O}(\sqrt{T})$ regret, $\mathcal{O}(T^{3/4})$ CCV) are looser than Sinha and Vaze (2024), which achieve $\mathcal{O}(\sqrt{T}\log T)$ CCV. The optimistic bounds recover prior art when specialized: they match Lekeufack and Jordan (2024) for COCO without memory ($m=0$) and Mhaisen and Iosifidis (2024) for unconstrained OCO-M.
While Algorithm 1 is clearly specified, critical implementation details for the optimistic variant are deferred to the appendix, including the full ODAF update rule. Hyperparameters such as $\lambda$ in Theorem 3 depend on the unknown total prediction error $\mathcal{E}_T(g^+)$; while a doubling trick is mentioned, its integration increases complexity. The code repository URL is provided in Appendix A, though the paper notes this is for future release. Full reproduction would require the specific ODAF implementation details and experimental scripts omitted from the main text.
We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.