Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks
This paper develops a differential-geometric framework for shallow neural networks that treats predictor classes rather than raw parameters as the fundamental objects. By quotienting out permutation and scaling symmetries on a regular set $\Theta_{\mathrm{reg}}$, the authors define a function-induced metric $g_\theta$ and an effective Hessian that removes spurious curvature degeneracies along symmetry orbits. The work connects implicit bias to quotient-level geometry, with concrete analysis for quadratic-activation models where parameters map explicitly to symmetric matrices $Q(\theta)=\sum_{i=1}^m a_i w_i w_i^\top$.
The paper offers a rigorous formalization of the intuition that ambient parameter-space geometry is redundant for shallow networks. The construction of the quotient manifold $\mathcal{M}_{\mathrm{reg}}=\Theta_{\mathrm{reg}}/G$ and the proof that the realization map descends to a locally injective immersion provides a solid foundation for the subsequent curvature analysis. However, the framework is limited to the regular set where neurons are distinct and active, excluding the singular configurations where optimization actually collapses neurons—a limitation the authors acknowledge but do not resolve algorithmically.
The mathematical exposition is clear and cites appropriate differential-geometric foundations from Lee (2003) and Absil et al. (2008). The distinction between vertical (symmetry) and horizontal (predictor-changing) components of parameter motion is well-developed, and the proof that only horizontal components contribute to first-order function evolution correctly formalizes gauge invariance in gradient flows. The quadratic-activation instantiation validates that the abstract quotient object can be represented concretely by a symmetric matrix $Q$, linking the theory to established matrix factorization geometry.
The framework explicitly excludes singular parameter configurations—vanishing neurons, weight collisions, and rank-deficient representations—which form a stratified set $\Theta_{\mathrm{sing}}$ where standard optimization dynamics actually occur. As noted in Proposition 2.2, at these points the quotient fails to be a manifold and the kernel of $D\Phi_X$ strictly exceeds the tangent space to the orbit, breaking the theoretical assumptions. Furthermore, the effective curvature computation requires explicit knowledge of the horizontal distribution $\mathcal{H}_\theta$, which is computationally impractical for standard gradient descent implementations, rendering the framework primarily descriptive rather than algorithmic.
The paper appropriately contextualizes its contribution within the literature on overparameterized two-layer networks (Du and Lee, 2018; Chizat and Bach, 2018) and matrix factorization (Gunasekar et al., 2017). However, the experimental validation is limited to the quadratic-activation model, and the abstract claims about general shallow networks are not fully supported by empirical comparison with other flatness measures or symmetry-handling techniques outside this special case. The theoretical comparisons to quotient-manifold optimization are fair, though the paper does not demonstrate practical advantages over standard Euclidean optimization in the experiments described.
The paper is primarily theoretical with minimal experimental detail provided in the available text. While the geometric constructions are mathematically well-defined, specific hyperparameters, initialization schemes, and dataset descriptions for the numerical experiments mentioned in the abstract are absent. No code repository or supplementary material is referenced, and reproducing the effective curvature calculations would require substantial implementation of the quotient geometry algorithms not provided in the manuscript.
Overparameterized shallow neural networks admit substantial parameter redundancy: distinct parameter vectors may represent the same predictor due to hidden-unit permutations, rescalings, and related symmetries. As a result, geometric quantities computed directly in the ambient Euclidean parameter space can reflect artifacts of representation rather than intrinsic properties of the predictor. In this paper, we develop a differential-geometric framework for analyzing simple shallow networks through the quotient space obtained by modding out parameter symmetries on a regular set. We first characterize the symmetry and quotient structure of regular shallow-network parameters and show that the finite-sample realization map induces a natural metric on the quotient manifold. This leads to an effective notion of curvature that removes degeneracy along symmetry orbits and yields a symmetry-reduced Hessian capturing intrinsic local geometry. We then study gradient flows on the quotient and show that only the horizontal component of parameter motion contributes to first-order predictor evolution, while the vertical component corresponds purely to gauge variation. Finally, we formulate an implicit-bias viewpoint at the quotient level, arguing that meaningful complexity should be assigned to predictor classes rather than to individual parameter representatives. Our experiments confirm that ambient flatness is representation-dependent, that local dynamics are better organized by quotient-level curvature summaries, and that in underdetermined regimes, implicit bias is most naturally described in quotient coordinates.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.