SGAD-SLAM: Splatting Gaussians at Adjusted Depth for Better Radiance Fields in RGBD SLAM

cs.CV Pengchong Hu, Zhizhong Han · Mar 22, 2026

What it does

Why it matters

This paper proposes pixel-aligned Gaussians that can adjust their positions along viewing rays via learned depth offsets, paired with a fast geometry-similarity tracking strategy using Generalized ICP on depth distributions. The approach...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

RGBD SLAM with 3D Gaussian Splatting (3DGS) struggles to balance scalability against rendering fidelity: global Gaussians consume excessive GPU memory, while view-tied Gaussians (fixed at depth) suffer from limited novel-view quality. This paper proposes pixel-aligned Gaussians that can adjust their positions along viewing rays via learned depth offsets, paired with a fast geometry-similarity tracking strategy using Generalized ICP on depth distributions. The approach claims state-of-the-art rendering and tracking performance while maintaining smaller active memory footprints than prior 3DGS-based methods.

Critical review

Verdict

Bottom line

The paper offers a technically sound contribution that improves upon the authors' prior VTGS-SLAM by allowing Gaussians to move along rays. The depth-offset mechanism is well-validated through ablations, and the simplified Gaussian representation (reducing attributes from 14 to 5) legitimately addresses memory concerns. However, the experimental methodology raises fairness issues regarding ground-truth depth usage for initialization, and scalability claims rely on multi-GPU configurations not standard in the field. While the tracking speed is impressive, the comparisons against loop-closure methods—explicitly acknowledged as 'unfair' by the authors—undermine some quantitative superiority claims.

“To initialize Gaussians from ground truth depth images, we first inpaint the missing depth values using the neighboring pixels.”

Hu & Han, Sec. 4.1 · Section 4.1

“Note that * indicates methods relying on pre-trained data-driven priors.”

Hu & Han, Table 3 caption · Table 3 caption

What holds up

The core idea of allowing ray-offset movement for pixel-aligned Gaussians is well-motivated and rigorously ablated. Table 9 demonstrates that fixing Gaussians at depth (Mode3) yields PSNR 37.16, while allowing full 3D movement (Mode1) with equal Gaussian counts drops to 20.90 due to under-convergence; the proposed ray-constrained movement (Ours) achieves 38.60. The simplified Gaussian representation—omitting rotation and two variance terms—successfully reduces storage while maintaining quality. The tracking strategy using GICP on depth-point distributions (not appearance Gaussians) delivers remarkable speed (0.01s/frame) and robustness to noise (Table 10 shows stable performance with 10-40% additional depth noise).

“Mode3... PSNR↑ 37.16... Ours... PSNR↑ 38.60”

Hu & Han, Table 9 · Table 9

“Ours†: Parallel on 8 GPUs... Tracking /Frame(s)↓ 0.01”

Hu & Han, Sec. 4.3 · Section 4.3

Main concerns

The reliance on ground-truth depth for Gaussian initialization significantly limits real-world applicability. The authors state they 'initialize Gaussians from ground truth depth images' using inpainting, yet do not clarify whether this dependency is removed during tracking or how the system handles pure odometry scenarios without GT depth. Comparisons against loop-closure methods (Loopy-SLAM, LoopSplat, CG-SLAM) are explicitly marked with asterisks to indicate reliance on pre-trained priors, which the authors call an 'unfair advantage'; nevertheless, the paper claims superiority over these methods without caveat in the main results (e.g., 'best tracking performance in 6 out of 8 scenes'). The scalability claims to 'large scenes' rely on parallelizing across 8 GPUs ('Ours†' in Table 8), a hardware configuration most research groups cannot replicate, and the single-GPU results still require maintaining a global Gaussian set $T$ that grows with scene size.

“To initialize Gaussians from ground truth depth images, we first inpaint [5] the missing depth values using the neighboring pixels.”

Hu & Han, Sec. 4.1 · Section 4.1

“Our method also shows improvements over methods that employ data-driven priors for loop detection with additional pose graph optimization, such as LoopSplat, CG-SLAM, and Loopy-SLAM.”

Hu & Han, Sec. 4.2 · Section 4.2

“Ours†: Parallel on 8 GPUs... Total /Frame(s)↓ 0.16”

Hu & Han, Table 8 · Table 8

Evidence and comparison

The evidence strongly supports the claim that ray-aligned offsets improve rendering over fixed-depth Gaussians (Fig. 3 and Table 9). However, the comparison to SplaTAM and Gaussian-SLAM in rendering metrics (Table 1) may conflate the benefits of the representation with the benefits of using GT depth for initialization. The tracking accuracy on Replica (Table 3) matches GS-ICP SLAM (0.16 cm average ATE RMSE), but the latter operates without the proposed depth-offset mechanism, suggesting the tracking gains may stem from the GICP implementation rather than the radiance field representation. The ScanNet++ experiments (Table 6) effectively demonstrate that rendering-based initialization ('Ours') dramatically improves upon constant-speed assumptions, but the improvements are shown against an ablated version of their own method rather than baseline competitors.

“GS-ICP SLAM... Avg. 0.16... Ours... Avg. 0.16”

Hu & Han, Table 3 · Table 3

“Ours(w/o Initialization)... Avg. 6.5... Ours... Avg. 0.59”

Hu & Han, Table 6 · Table 6

Reproducibility

The paper refers to a project page for code and videos, but the repository was not available at the time of submission. Critical hyperparameters (e.g., balance weights $\rho, \tau, \sigma$ in Eq. 1, number of neighboring frames $NN$, GICP iteration counts) are relegated to supplementary material not provided in the main text. The method requires depth inpainting for initialization, which is non-trivial to reproduce without exact implementation details. While the core algorithms (GICP, 3DGS splatting) are standard, reproducing the reported 0.16s total runtime per frame requires the specific 8-GPU parallelization strategy described in Table 8, which is not standard hardware for SLAM evaluation.

“Further details are provided in the supplementary material.”

Hu & Han, Sec. 4.1 · Section 4.1

“Please see our project page for code and videos at https://machineperceptionlab.github.io/SGAD-SLAM-Project”

Hu & Han, Abstract · Abstract

Abstract

3D Gaussian Splatting (3DGS) has made remarkable progress in RGBD SLAM. Current methods usually use 3D Gaussians or view-tied 3D Gaussians to represent radiance fields in tracking and mapping. However, these Gaussians are either too flexible or too limited in movements, resulting in slow convergence or limited rendering quality. To resolve this issue, we adopt pixel-aligned Gaussians but allow each Gaussian to adjust its position along its ray to maximize the rendering quality, even if Gaussians are simplified to improve system scalability. To speed up the tracking, we model the depth distribution around each pixel as a Gaussian distribution, and then use these distributions to align each frame to the 3D scene quickly. We report our evaluations on widely used benchmarks, justify our designs, and show advantages over the latest methods in view rendering, camera tracking, runtime, and storage complexity. Please see our project page for code and videos at https://machineperceptionlab.github.io/SGAD-SLAM-Project .

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.