Your paper timeline
Scroll AI takes the way you would scroll a great paper aggregator: quick signal first, deeper critique when something earns your attention, and challenges when a claim feels off.
27 papers in stat.ML
Trending mixes fresh papers with community signal.
0
stat.MLcs.LGmath.DG Sing-Yuan Yeh, Yi-An Wu, Hau-Tieng Wu et al. · Mar 22, 2026

Vector Diffusion Maps (VDM) capture pairwise connection relationships in complex datasets via the Graph Connection Laplacian, but eigenvalue decomposition costs $O(n^{2.81})$, prohibiting large-scale applications. This paper proposes LA-VDM (Landmark Accelerated VDM), which constrains diffusion through landmark points and introduces a novel two-stage normalization scheme with parameters $\alpha$ and $\beta$ to handle non-uniform sampling densities in both data and landmarks. Under a manifold model with the frame bundle structure, the authors prove that LA-VDM asymptotically converges to the connection Laplacian while reducing complexity to $O(nm^2)$, enabling applications to datasets with millions of points.

We propose a landmark-constrained algorithm, LA-VDM (Landmark Accelerated Vector Diffusion Maps), to accelerate the Vector Diffusion Maps (VDM) framework built upon the Graph Connection Laplacian (GCL), which captures pairwise connection relationships within complex datasets. LA-VDM introduces a novel two-stage normalization that effectively address nonuniform sampling densities in both the data and the landmark sets. Under a manifold model with the frame bundle structure, we show that we can accurately recover the parallel transport with landmark-constrained diffusion from a point cloud, and hence asymptotically LA-VDM converges to the connection Laplacian. The performance and accuracy of LA-VDM are demonstrated through experiments on simulated datasets and an application to nonlocal image denoising.
0
cs.LGstat.COstat.ME Foo Hui-Mean, Yuan-chin I Chang · Mar 22, 2026

ALMAB-DC unifies Gaussian process active learning, multi-armed bandit scheduling, and asynchronous distributed computing to tackle expensive black-box optimization in sequential experimental design. The framework targets dose-finding, spatial field estimation, and ML/engineering tasks, claiming superior sample efficiency and near-linear parallel speedups up to $K=16$ agents. While the modular architecture and ablation analyses are rigorous, all empirical results derive from calibrated surrogate emulators rather than live systems, substantially limiting external validity.

Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A Gaussian process surrogate with uncertainty-aware acquisition identifies informative query points; a UCB or Thompson-sampling bandit controller allocates evaluations across parallel workers; and an asynchronous scheduler handles heterogeneous runtimes. We present cumulative regret bounds for the bandit components and characterize parallel scalability via Amdahl's Law. We validate ALMAB-DC on five benchmarks. On the two statistical experimental-design tasks, ALMAB-DC achieves lower simple regret than Equal Spacing, Random, and D-optimal designs in dose--response optimization, and in adaptive spatial field estimation matches the Greedy Max-Variance benchmark while outperforming Latin Hypercube Sampling; at $K=4$ the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to $C_D = 0.059$ (36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney $U$ tests. Distributed execution achieves $7.5\times$ speedup at $K = 16$ agents, consistent with Amdahl's Law.
0
stat.MLcs.LGmath.ST Yingzhen Yang, Ping Li · Mar 22, 2026

This paper studies nonparametric regression for learning degree-$k_0$ spherical polynomials on the unit sphere $\mathbb{S}^{d-1}$ using over-parameterized two-layer neural networks. The authors propose a novel Gradient Descent with Projection (GDP) algorithm that constrains learning to the top $r_0 = \Theta(d^{k_0})$ eigenspaces of the Neural Tangent Kernel (NTK). The main result establishes a nearly minimax optimal risk bound of order $\log(4/\delta) \cdot \Theta(d^{k_0}/n)$, improving the sample complexity from previous polynomial-in-$1/\varepsilon$ rates to linear $1/\varepsilon$ scaling.

We study the problem of learning a low-degree spherical polynomial of degree $k_0 = \Theta(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0, \Theta(d^{-k_0})]$, an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of $n \asymp \Theta( \log(4/\delta) \cdot d^{k_0}/\eps)$ with probability $1-\delta$ for $\delta \in (0,1)$, in contrast with the representative sample complexity $\Theta(d^{k_0} \max\set{\eps^{-2},\log d})$. Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric regression risk of the order $\log({4}/{\delta}) \cdot \Theta(d^{k_0}/{n})$ with probability at least $1-\delta$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $\Theta(d^{k_0})$ is $\Theta(d^{k_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GDP is nearly minimax optimal. In the case that the ground truth degree $k_0$ is unknown, we present a novel and provable adaptive degree selection algorithm which identifies the true degree and achieves the same nearly optimal regression rate. To the best of our knowledge, this is the first time that a nearly optimal risk bound is obtained by training an over-parameterized neural network with a popular activation function (ReLU) and algorithmic guarantee for learning low-degree spherical polynomials. Due to the feature learning capability of GDP, our results are beyond the regular Neural Tangent Kernel (NTK) limit.
0
cs.LGcs.AIstat.ME Marc Franquesa Mon\'es, Jiaqi Zhang, Caroline Uhler · Mar 23, 2026

Constraint-based causal discovery algorithms like PC require exponentially many conditional independence (CI) tests in the worst case---specifically $p^{\mathcal{O}(d)}$ where $d$ is the maximum degree. This paper establishes that the fundamental complexity parameter is actually $s$, the maximum undirected clique size in the essential graph, which can be much smaller than $d$ (e.g., $s=2$ vs $d=p-2$ in Figure 1). The authors propose Greedy Ancestral Search (GAS), which achieves $p^{\mathcal{O}(s)}$ CI tests, and prove a matching lower bound of $2^{\Omega(s)}$, establishing exponent-optimality up to a logarithmic factor.

Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of $p^{\mathcal{O}(s)}$ tests, where $p$ is the number of nodes in the graph and $s$ denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least $2^{\Omega(s)}$ conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.
0
cs.LGcs.AIstat.ML Abdou-Raouf Atarmla · Mar 23, 2026

Rule-State Inference (RSI) addresses compliance monitoring in domains like taxation where authoritative rules are known a priori but observations are partial, noisy, or strategically distorted. The paper proposes a Bayesian framework that inverts the standard ML paradigm: instead of learning rules from data, RSI encodes regulatory rules as structured priors and infers latent rule states (activation, compliance rate, parametric drift) via posterior inference. This enables zero-shot compliance assessment without labeled training data—a critical capability for low-resource environments where non-compliance labels are scarce or legally sensitive.

Existing machine learning frameworks for compliance monitoring -- Markov Logic Networks, Probabilistic Soft Logic, supervised models -- share a fundamental paradigm: they treat observed data as ground truth and attempt to approximate rules from it. This assumption breaks down in rule-governed domains such as taxation or regulatory compliance, where authoritative rules are known a priori and the true challenge is to infer the latent state of rule activation, compliance, and parametric drift from partial and noisy observations. We propose Rule-State Inference (RSI), a Bayesian framework that inverts this paradigm by encoding regulatory rules as structured priors and casting compliance monitoring as posterior inference over a latent rule-state space S = {(a_i, c_i, delta_i)}, where a_i captures rule activation, c_i models the compliance rate, and delta_i quantifies parametric drift. We prove three theoretical guarantees: (T1) RSI absorbs regulatory changes in O(1) time via a prior ratio correction, independently of dataset size; (T2) the posterior is Bernstein-von Mises consistent, converging to the true rule state as observations accumulate; (T3) mean-field variational inference monotonically maximizes the Evidence Lower BOund (ELBO). We instantiate RSI on the Togolese fiscal system and introduce RSI-Togo-Fiscal-Synthetic v1.0, a benchmark of 2,000 synthetic enterprises grounded in real OTR regulatory rules (2022-2025). Without any labeled training data, RSI achieves F1=0.519 and AUC=0.599, while absorbing regulatory changes in under 1ms versus 683-1082ms for full model retraining -- at least a 600x speedup.
0
stat.MLcs.AIcs.CL Oussama Zekri, Th\'eo Uscidda, Nicolas Boull\'e et al. · Mar 22, 2026

Discrete diffusion models have been limited to simplistic noising schemes like uniform corruption or masking, restricting their ability to leverage semantic structure in large vocabularies. This paper introduces GDDS (Generalized Discrete Diffusion from Snapshots), a framework supporting arbitrary continuous-time Markov chain noising processes via exact uniformization-based sampling and a tractable snapshot-level ELBO. The work achieves state-of-the-art results on large-scale language modeling tasks, claiming to surpass autoregressive baselines for the first time at this scale.

We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : \href{https://oussamazekri.fr/gdds}{https://oussamazekri.fr/gdds}.
0
stat.MLcs.AIcs.CV Osamu Hirose, Emanuele Rodola · Mar 22, 2026

Domain Elastic Transform (DET) addresses the registration of high-dimensional vector-valued functions on irregular, sparse manifolds—a critical bottleneck in spatial transcriptomics where gene expression data resides on scattered cell positions rather than regular grids. The core idea is a Bayesian framework that treats registration as elastic domain deformation guided by a joint spatial-functional likelihood, bypassing the lossy voxelization required by image-based methods while exploiting functional signals that pure geometric point-set registration ignores. This matters because it enables training-free analysis of massive atlases (e.g., MERFISH, Stereo-seq) without sacrificing single-cell resolution.

Nonrigid registration is conventionally divided into point set registration, which aligns sparse geometries, and image registration, which aligns continuous intensity fields on regular grids. However, this dichotomy creates a critical bottleneck for emerging scientific data, such as spatial transcriptomics, where high-dimensional vector-valued functions, e.g., gene expression, are defined on irregular, sparse manifolds. Consequently, researchers currently face a forced choice: either sacrifice single-cell resolution via voxelization to utilize image-based tools, or ignore the critical functional signal to utilize geometric tools. To resolve this dilemma, we propose Domain Elastic Transform (DET), a grid-free probabilistic framework that unifies geometric and functional alignment. By treating data as functions on irregular domains, DET registers high-dimensional signals directly without binning. We formulate the problem within a rigorous Bayesian framework, modeling domain deformation as an elastic motion guided by a joint spatial-functional likelihood. The method is fully unsupervised and scalable, utilizing feature-sensitive downsampling to handle massive atlases. We demonstrate that DET achieves 92\% topological preservation on MERFISH data where state-of-the-art optimal transport methods struggle ($<$5\%), and successfully registers whole-embryo Stereo-seq atlases across developmental stages -- a task involving massive scale and complex nonrigid growth. The implementation of DET is available on {https://github.com/ohirose/bcpd} (since Mar, 2025).