Learning operators on labelled conditional distributions with applications to mean field control of non exchangeable systems

math.OC math.PR stat.ML Samy Mekkaoui, Huy\^en Pham, Xavier Warin · Mar 23, 2026
Local to this browser
What it does
This paper develops a neural operator framework for approximating mappings defined on constrained Wasserstein spaces $\mathcal{M}_\lambda$, consisting of probability measures on $I \times \mathbb{R}^d$ with prescribed marginal $\lambda$ on...
Why it matters
The core contribution is the DeepONetCyl architecture, which combines cylindrical moment approximations $\Phi_J(\mu) = (\langle \varphi_1, \mu \rangle, \ldots, \langle \varphi_J, \mu \rangle)$ with a DeepONet-type branch–trunk structure to...
Main concern
The paper delivers a rigorous universal approximation theorem (Theorem 2. 1) for continuous operators on $\mathcal{M}_\lambda$ and proposes practical sampling algorithms for generating training measures.
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

This paper develops a neural operator framework for approximating mappings defined on constrained Wasserstein spaces $\mathcal{M}_\lambda$, consisting of probability measures on $I \times \mathbb{R}^d$ with prescribed marginal $\lambda$ on the label space $I$. The core contribution is the DeepONetCyl architecture, which combines cylindrical moment approximations $\Phi_J(\mu) = (\langle \varphi_1, \mu \rangle, \ldots, \langle \varphi_J, \mu \rangle)$ with a DeepONet-type branch–trunk structure to preserve the marginal constraint. This enables learning of heterogeneous (non-exchangeable) mean-field control problems where agent interactions depend on labels, extending prior neural methods beyond the exchangeable case.

Critical review
Verdict
Bottom line

The paper delivers a rigorous universal approximation theorem (Theorem 2.1) for continuous operators on $\mathcal{M}_\lambda$ and proposes practical sampling algorithms for generating training measures. However, the numerical experiments are limited to simple synthetic mean-field functionals (linear and quadratic expectations) rather than the full optimal control problems (FBSDE/HJB systems) developed in Section 4. The theoretical claims regarding MFC applications thus remain experimentally unsubstantiated in the provided text, and the work would benefit from benchmarking against particle-based or classical numerical methods.

“Then, for all $\epsilon > 0$, there exists $J,r \in \mathbb{N}^{\star}$ ... such that $\int_{\mathcal{M}_{\lambda}}\mathbb{E}_{(U,X)\sim\mu}\Big[\big|V(U,X,\mu)-\sum_{k=1}^{r}\mathcal{T}_{k}(U,X)\mathcal{B}_{k}(\Phi_{J}(\mu))\big|^{2}\Big]\rho(\mathrm{d}\mu)\leq\epsilon$”
Theorem 2.1 · Section 2.1
“To the best of our knowledge, neural operator approximations for such operators on constrained measure spaces arising in heterogeneous mean-field control have not been investigated.”
Mekkaoui et al. · Related work
What holds up

The universal approximation result is theoretically sound, leveraging Stone–Weierstrass arguments to establish that the DeepONetCyl architecture is dense in the space of continuous operators with respect to the $L^2(\rho)$ topology induced by the Wasserstein distance. The proof effectively handles the marginal constraint $pr_1 \sharp \mu = \lambda$ through cylindrical approximations. Additionally, Lemma 2.3 and the sampling strategies (S1, S2) provide constructive, measure-theoretically grounded procedures for generating training data in $\mathcal{M}_\lambda$ using transport maps and randomization lemmas.

“The proof combines cylindrical approximations of probability measures with a DeepONet-type branch–trunk neural architecture, yielding finite-dimensional representations of such operators.”
Mekkaoui et al. · Main contributions
“Let $\nu$ be a non-atomic reference probability ... Then, there exists a measurable map $T \in L^{2}(\lambda \otimes \nu; \mathbb{R}^{d})$ such that $T(u,\cdot)\sharp\nu = \mu^{u}$ $\lambda(\mathrm{d}u)$-a.e.”
Lemma 2.3 · Section 2.3
Main concerns

The primary limitation is the disconnect between theory and experiment: Section 4 elaborates complex algorithms for MFC via maximum principle (FBSDE) and dynamic programming (HJB), yet Section 3 validates only simple test functions $V_1(u,x,\mu) = x - \mathbb{E}_{(U,X)\sim\mu}[G(u,U)X]$ and $V_2(u,x,\mu) = \mathbb{E}_{\mu}[(x-G(u,U)X)^2]$. No experiments demonstrate actual control problem solutions or the claimed decoupling fields. Furthermore, the paper lacks sample complexity bounds, scaling analysis with respect to label space dimension, or comparisons against particle-based Monte Carlo methods, leaving the computational advantage unquantified.

“We test our algorithms by computing the mean-squared error (MSE) for different cases of non exchangeable mean-field functions $V$ on $I \times \mathbb{R} \times \mathcal{M}_{\lambda}$.”
Section 3 · Numerical experiments
“Non exchangeable mean field network approximation of a map”
Algorithm 1 · Section 2.4
Evidence and comparison

The evidence supports approximation capability for the specific synthetic functionals tested, showing that Spline KAN and P1KAN architectures outperform standard feedforward networks for irregular graphons like $G_2$. However, the comparison is limited to internal architectural variants rather than against baseline MFC solvers or the homogeneous neural methods cited in [30, 31]. The claim that this extends previous neural approaches to the heterogeneous setting remains qualitative, as no quantified comparison (accuracy vs. computational cost) is provided against exchangeable approximations or discretized particle systems.

“On Figure 6, we show that the KAN networks converge better and faster than the feedforward. As the functions to approximate are rather irregular, the P1KAN network outperforms the two other ones.”
Section 3 · Numerical results
“This extends neural mean-field control methods previously developed for homogeneous (exchangeable) systems to the heterogeneous setting.”
Related work · Section 1
Reproducibility

While the paper provides Algorithm 1 and some implementation details (ADAM optimizer, learning rate $0.001$, architectures with 3 hidden layers of 10 neurons), critical reproducibility elements are missing. No code repository or supplementary materials are referenced, exact training times are reported only selectively (e.g., "100 iterations takes 3.56 seconds"), and random seeds or precise initialization schemes are unspecified. The description of the cylindrical feature maps $\varphi_i(u,x) = |x|^i + u^i$ is provided, but the exact hyperparameter grids for $J$, $r$, and the KAN grid sizes (mentioned as 5 or 10) are not fully documented, which would hinder independent reproduction of the convergence curves.

“All the tests are achieved using the ADAM optimization method with a learning rate of $0.001$ ... The default value for $r$ is $10$ and we consider two architectures ... 3 hidden layers of 10 neurons.”
Section 3 · Implementation details
“for $1 \leq i \leq J$, we choose the moment maps $\varphi_{i}(u,x) := |x|^{i} + u^{i}$”
Section 2.4 · Training
Abstract

We study the approximation of operators acting on probability measures on a product space with prescribed marginal. Let $I$ be a label space endowed with a reference measure $\lambda$, and define $\cal M_\lambda$ as the set of probability measures on $I\times \mathbb{R}^d$ with first marginal $\lambda$. By disintegration, elements of $\cal M_\lambda$ correspond to families of labeled conditional distributions. Operators defined on this constrained measure space arise naturally in mean-field control problems with heterogeneous, non-exchangeable agents. Our main theoretical result establishes a universal approximation theorem for continuous operators on $\cal M_\lambda$. The proof combines cylindrical approximations of probability measures with DeepONet-type branch-trunk neural architecture, yielding finite-dimensional representations of such operators. We further introduce a sampling strategy for generating training measures in $\cal M_\lambda$, enabling practical learning of such conditional mean-field operators. We apply the method to the numerical resolution of mean-field control problems with heterogeneous interactions, thereby extending previous neural approaches developed for homogeneous (exchangeable) systems. Numerical experiments illustrate the accuracy and computational effectiveness of the proposed framework.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.