Your paper timeline
Scroll AI takes the way you would scroll a great paper aggregator: quick signal first, deeper critique when something earns your attention, and challenges when a claim feels off.
2 papers in stat.TH
Trending mixes fresh papers with community signal.
0
math.STcs.LGstat.TH Jing Zhang, Chenping Hou · Mar 23, 2026

Feature incremental clustering addresses dynamic scenarios where data arrives in expanding feature spaces—such as activity recognition systems that acquire new sensors over time. This paper proposes four k-means-based algorithms (FIC-FT, FIC-DR, FIC-DA, FIC-MR) tailored to different data-access constraints, from full historical access to model-only reuse. The core theoretical contribution establishes generalization error bounds for all four settings, revealing that model reuse (FIC-MR) can achieve a fast $\tilde{\mathcal{O}}(1/n_2)$ convergence rate when the pre-trained model aligns well with the current distribution.

In many learning systems, such as activity recognition systems, as new data collection methods continue to emerge in various dynamic environmental applications, the attributes of instances accumulate incrementally, with data being stored in gradually expanding feature spaces. How to design theoretically guaranteed algorithms to effectively cluster this special type of data stream, commonly referred to as activity recognition, remains unexplored. Compared to traditional scenarios, we will face at least two fundamental questions in this feature incremental scenario. (i) How to design preliminary and effective algorithms to address the feature incremental clustering problem? (ii) How to analyze the generalization bounds for the proposed algorithms and under what conditions do these algorithms provide a strong generalization guarantee? To address these problems, by tailoring the most common clustering algorithm, i.e., $k$-means, as an example, we propose four types of Feature Incremental Clustering (FIC) algorithms corresponding to different situations of data access: Feature Tailoring (FT), Data Reconstruction (DR), Data Adaptation (DA), and Model Reuse (MR), abbreviated as FIC-FT, FIC-DR, FIC-DA, and FIC-MR. Subsequently, we offer a detailed analysis of the generalization error bounds for these four algorithms and highlight the critical factors influencing these bounds, such as the amounts of training data, the complexity of the hypothesis space, the quality of pre-trained models, and the discrepancy of the reconstruction feature distribution. The numerical experiments show the effectiveness of the proposed algorithms, particularly in their application to activity recognition clustering tasks.
0
stat.MLcs.LGmath.ST Yingzhen Yang, Ping Li · Mar 22, 2026

This paper studies nonparametric regression for learning degree-$k_0$ spherical polynomials on the unit sphere $\mathbb{S}^{d-1}$ using over-parameterized two-layer neural networks. The authors propose a novel Gradient Descent with Projection (GDP) algorithm that constrains learning to the top $r_0 = \Theta(d^{k_0})$ eigenspaces of the Neural Tangent Kernel (NTK). The main result establishes a nearly minimax optimal risk bound of order $\log(4/\delta) \cdot \Theta(d^{k_0}/n)$, improving the sample complexity from previous polynomial-in-$1/\varepsilon$ rates to linear $1/\varepsilon$ scaling.

We study the problem of learning a low-degree spherical polynomial of degree $k_0 = \Theta(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0, \Theta(d^{-k_0})]$, an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of $n \asymp \Theta( \log(4/\delta) \cdot d^{k_0}/\eps)$ with probability $1-\delta$ for $\delta \in (0,1)$, in contrast with the representative sample complexity $\Theta(d^{k_0} \max\set{\eps^{-2},\log d})$. Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric regression risk of the order $\log({4}/{\delta}) \cdot \Theta(d^{k_0}/{n})$ with probability at least $1-\delta$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $\Theta(d^{k_0})$ is $\Theta(d^{k_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GDP is nearly minimax optimal. In the case that the ground truth degree $k_0$ is unknown, we present a novel and provable adaptive degree selection algorithm which identifies the true degree and achieves the same nearly optimal regression rate. To the best of our knowledge, this is the first time that a nearly optimal risk bound is obtained by training an over-parameterized neural network with a popular activation function (ReLU) and algorithmic guarantee for learning low-degree spherical polynomials. Due to the feature learning capability of GDP, our results are beyond the regular Neural Tangent Kernel (NTK) limit.