FreeArtGS: Articulated Gaussian Splatting Under Free-moving Scenario

cs.CV cs.GR cs.RO Hang Dai, Hongwei Fan, Han Zhang, Duojin Wu, Jiyao Zhang, Hao Dong · Mar 23, 2026
Local to this browser
What it does
Articulated object reconstruction typically requires either multi-view capture of discrete states or monocular video with a strict static-base-part assumption, limiting practical deployment. FreeArtGS introduces a "free-moving" setting...
Why it matters
FreeArtGS introduces a "free-moving" setting where both joint angles and object poses vary arbitrarily during capture, using only a monocular RGB-D video. The method combines motion-based part segmentation via point tracking priors with...
Main concern
FreeArtGS presents a well-engineered solution to the free-moving articulated reconstruction problem, achieving strong quantitative results (approximately 1° axis error on the proposed FreeArt-21 benchmark). The three-stage pipeline...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Articulated object reconstruction typically requires either multi-view capture of discrete states or monocular video with a strict static-base-part assumption, limiting practical deployment. FreeArtGS introduces a "free-moving" setting where both joint angles and object poses vary arbitrarily during capture, using only a monocular RGB-D video. The method combines motion-based part segmentation via point tracking priors with joint estimation and 3D Gaussian Splatting optimization to jointly reconstruct geometry, appearance, and articulation.

Critical review
Verdict
Bottom line

FreeArtGS presents a well-engineered solution to the free-moving articulated reconstruction problem, achieving strong quantitative results (approximately 1° axis error on the proposed FreeArt-21 benchmark). The three-stage pipeline demonstrates clear benefits over baselines in both free-moving and static-base settings. However, the approach is restricted to two-part objects and relies on a cascade of off-the-shelf pose estimators and trackers.

“FreeArtGS still has several limitations. First, our method currently assumes a two-part articulated object; extending it to multi-part structures by sequentially capturing each moving part remains an important direction. Second, relying on multiple off-the-shelf priors can lead to cascading error accumulation.”
Dai et al., Sec. 5 · Section 5
What holds up

The motion-based part segmentation (Section 3.2) effectively leverages dense point tracking and DINOv3 features to identify rigid parts without static-base assumptions, with entropy $\mathcal{L}_{\mathrm{ent}}=-\sum_{p}[w_{t,p}\log w_{t,p}+(1-w_{t,p})\log(1-w_{t,p})]$ and smoothness regularization preventing degenerate solutions. The joint estimation module robustly recovers parameters via pairwise relative transforms and outlier filtering, achieving axis errors of $1.04\pm1.03$° on revolute joints. The end-to-end optimization jointly refines geometry and articulation, with blended rendering $\mathcal{G}_i = w(\mathcal{G}_c \circ I) \cup (1-w)(\mathcal{G}_c \circ \mathcal{J}_i)$ improving visual fidelity.

“$\mathcal{L}_{\mathrm{ent}}=-\sum_{p}\Big[w_{t,p}\log w_{t,p}+(1-w_{t,p})\log(1-w_{t,p})\Big]$”
Dai et al., Sec. 3.2 · Section 3.2
“our method achieves an average error of around 1 degree in axis angle and less than 1 cm in geometry”
Dai et al., Sec. 4.3 · Table 1
Main concerns

The method is explicitly limited to two-part articulated objects, excluding common multi-joint mechanisms. The heavy reliance on off-the-shelf models (AllTracker, BundleSDF, DINOv3) creates vulnerability to cascading failures; while Table 4 shows robustness to component swaps, the ablation does not quantify failure modes when these pretrained models produce systematic errors. Additionally, the FreeArt-21 benchmark comprises only 21 simulated objects, potentially limiting generalization claims for diverse real-world scenes.

“First, our method currently assumes a two-part articulated object; extending it to multi-part structures by sequentially capturing each moving part remains an important direction.”
Dai et al., Sec. 5 · Section 5
“In cases involving narrow or elongated structures or objects with extremely low texture, the tracked 3D trajectories often exhibit significant drift.”
Dai et al., Sec. 9 · Section 9
Evidence and comparison

The evidence supports the claim that FreeArtGS outperforms prior work in the free-moving regime, with Video2Articulation showing $20.00\pm28.81$° axis error versus $1.04\pm1.03$° for FreeArtGS on revolute joints in Table 1. Comparisons on Video2Articulation-S demonstrate generalization to static-base settings, though this favors FreeArtGS as baselines were not designed for free-moving scenarios. The ablation in Table 1 validates that noise resistance is critical; removing it degrades axis error to $4.75\pm7.83$°.

“Ours: 1.04±1.03 vs Video2Articulation: 20.00±28.81”
Dai et al., Table 1 · Table 1
“Video2Articulation also performs poorly on our dataset, since Monst3R fails to predict the moving part in the free-moving scenario.”
Dai et al., Sec. 4.3 · Section 4.3
Reproducibility

The implementation uses NeRFStudio with detailed hyperparameters ($\lambda_m=200$, 30k iterations, Adam LR $10^{-4}$ for transforms), and reports runtime of ~25 minutes on an RTX 4090 for 100 frames. However, the paper does not explicitly commit to code release, stating only that "the project page is available." Reproduction requires specific off-the-shelf checkpoints (AllTracker, BundleSDF) and VR teleoperation setups for benchmark creation, though the method description is sufficiently detailed to attempt reimplementation.

“Our implementation is based on NeRFStudio ... takes ∼25 minutes, including 6 minutes for part segmentation, 1 minute for joint estimation, and 18 minutes for end-to-end optimization.”
Dai et al., Sec. 4.2 · Section 4.2
Abstract

The increasing demand for augmented reality and robotics is driving the need for articulated object reconstruction with high scalability. However, existing settings for reconstructing from discrete articulation states or casual monocular videos require non-trivial axis alignment or suffer from insufficient coverage, limiting their applicability. In this paper, we introduce FreeArtGS, a novel method for reconstructing articulated objects under free-moving scenario, a new setting with a simple setup and high scalability. FreeArtGS combines free-moving part segmentation with joint estimation and end-to-end optimization, taking only a monocular RGB-D video as input. By optimizing with the priors from off-the-shelf point-tracking and feature models, the free-moving part segmentation module identifies rigid parts from relative motion under unconstrained capture. The joint estimation module calibrates the unified object-to-camera poses and recovers joint type and axis robustly from part segmentation. Finally, 3DGS-based end-to-end optimization is implemented to jointly reconstruct visual textures, geometry, and joint angles of the articulated object. We conduct experiments on two benchmarks and real-world free-moving articulated objects. Experimental results demonstrate that FreeArtGS consistently excels in reconstructing free-moving articulated objects and remains highly competitive in previous reconstruction settings, proving itself a practical and effective solution for realistic asset generation. The project page is available at: https://freeartgs.github.io/

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.