Constrained Online Convex Optimization with Memory and Predictions

cs.LG stat.ML Mohammed Abdullah, George Iosifidis, Salah Eddine Elayoubi, Tijani Chahed · Mar 22, 2026

What it does

Why it matters

The authors propose the first algorithms achieving sublinear regret and cumulative constraint violation (CCV) under adversarial, time-varying constraints, both with and without unreliable predictions of future gradients. This work bridges...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

The paper tackles Constrained Online Convex Optimization with Memory (COCO-M), where both losses and constraints depend on a window of past decisions, capturing realistic scenarios like smart-grid budgets and battery health limits. The authors propose the first algorithms achieving sublinear regret and cumulative constraint violation (CCV) under adversarial, time-varying constraints, both with and without unreliable predictions of future gradients. This work bridges the gap between classical constrained OCO and practical memory-dependent control problems.

Critical review

Verdict

Bottom line

The paper introduces the first algorithms for COCO-M and COCO-M² (memory in both losses and constraints) achieving sublinear regret and CCV, tightening prior bounds from $\mathcal{O}(T^{2/3}\log^2 T)$ to $\mathcal{O}(m^{3/2}\sqrt{T\log T})$ regret for the full memory case. However, the optimistic results with predictions rely on strong separability and linearity assumptions (Assumptions 4-5), and the benchmark set $\mathcal{X}_T^{mp}$ is strictly more restrictive than the standard constrained benchmark set $\mathcal{X}_T^m$.

“$\mathcal{R}_{T}^{mc} = \mathcal{O}\big(m^{\frac{3}{2}}\sqrt{T\log(T)}\big)$”

paper · Theorem 1

“$\mathcal{V}_{T}^{mc} = \mathcal{O}\left(\max\{T^{3/4}, m^{\frac{3}{2}}\sqrt{T\log(T)}\right)$”

paper · Theorem 1

What holds up

The penalty method with adaptive quadratic penalty $\Phi(V) = \lambda V^2$ elegantly handles the coupling between constraint violation and objective, and the reduction from memory-dependent to memory-less problems via Lipschitz continuity is technically sound. The reinterpretation of the memory problem as delayed feedback through the forward function $Z_t(x_t)$ is a clever reformulation that enables application of optimistic delayed AdaFTRL while accommodating unreliable predictions.

“we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves”

paper · Section 5

Main concerns

The optimistic algorithm requires restrictive assumptions: Assumption 4 (Separability) mandates that memory functions decompose into per-round components $f_t(x_{t-m}^t) = \sum_{i=0}^m f_t^i(x_{t-i})$, and Assumption 5 (Linearity) requires these components to be linear, excluding general convex losses. The benchmark $\mathcal{X}_T^{mp} = \{x \in \mathcal{X}: g_t^i(x) \leq 0, \forall t, i \leq m\}$ is more restrictive than $\mathcal{X}_T^m$ as it requires satisfying all decomposed constraint components individually rather than the aggregate constraints. The comparison to Liu et al. (2023) is limited: their $\mathcal{O}(T^{2/3}\log^2 T)$ bound applies when $m = \log T$, while this paper's improvement to $\mathcal{O}(m^{3/2}\sqrt{T\log T})$ holds for arbitrary $m$, making the regimes not fully comparable.

“Every memory-based function can be decomposed into components, each depending on a decision from a different round”

paper · Assumption 4

“$\mathcal{X}_{T}^{mp} = \bigl\{\,x\in\mathcal{X}:g_{t}^{\,i}(x)\leq 0,\ \forall t\in\mathcal{T},\ i\leq m\bigr\}$”

paper · Section 5

Evidence and comparison

The evidence supports the claims for the no-prediction setting, with explicit bounds provided in Theorems 1 and 2 that improve upon the $\mathcal{O}(T^{2/3}\log^2 T)$ rates of Liu et al. (2023) for COCO-M² when $T \in [3, 10^{49}]$. The paper acknowledges that when $m=0$, its bounds ($\mathcal{O}(\sqrt{T})$ regret, $\mathcal{O}(T^{3/4})$ CCV) are looser than Sinha and Vaze (2024), which achieve $\mathcal{O}(\sqrt{T}\log T)$ CCV. The optimistic bounds recover prior art when specialized: they match Lekeufack and Jordan (2024) for COCO without memory ($m=0$) and Mhaisen and Iosifidis (2024) for unconstrained OCO-M.

“improving upon the $\mathcal{R}_{T},\mathcal{V}_{T} = \mathcal{O}(T^{2/3}\log^{2}T)$ bounds of (Liu et al., 2023) for $T \in [3, 10^{49}]$”

paper · Section 4

“When memory vanishes ($m=0$), our bounds collapse to $\mathcal{O}(\sqrt{T})$ for the regret and $\mathcal{O}(T^{3/4})$ for the CCV; which are looser than the bounds of (Sinha and Vaze, 2024)”

paper · Section 4

Reproducibility

While Algorithm 1 is clearly specified, critical implementation details for the optimistic variant are deferred to the appendix, including the full ODAF update rule. Hyperparameters such as $\lambda$ in Theorem 3 depend on the unknown total prediction error $\mathcal{E}_T(g^+)$; while a doubling trick is mentioned, its integration increases complexity. The code repository URL is provided in Appendix A, though the paper notes this is for future release. Full reproduction would require the specific ODAF implementation details and experimental scripts omitted from the main text.

“The details for ODAF are deferred to Appendix”

paper · Section 5

“$\lambda = \frac{1}{2\left(C\sqrt{\mathcal{E}_{T}(g^{+})}+G(m+1)\right)}$”

paper · Theorem 3

“All code and data used for the computational experiment will be made publicly available at: https://github.com/mohammadabduallah/COCO-Memory-AAAI26”

paper · Appendix A

Abstract

We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.