Feed - arxlens

0

GeoFusion-CAD: Structure-Aware Diffusion with Geometric State Space for Parametric 3D Design

cs.CV cs.GR Xiaolei Zhou, Chuangjie Fang, Jie Wu et al. · Mar 23, 2026

GeoFusion-CAD tackles the scalability bottleneck in parametric CAD generation, where Transformer-based methods struggle with long command sequences due to quadratic attention costs. The authors propose an end-to-end diffusion framework that encodes CAD programs as hierarchical trees and processes them with G-Mamba blocks—geometry-conditioned state-space models that achieve linear complexity $\mathcal{O}(Ld)$ while capturing geometric and topological dependencies. This enables scaling to sequences of up to 240 commands while maintaining high geometric fidelity.

Parametric Computer-Aided Design (CAD) is fundamental to modern 3D modeling, yet existing methods struggle to generate long command sequences, especially under complex geometric and topological dependencies. Transformer-based architectures dominate CAD sequence generation due to their strong dependency modeling, but their quadratic attention cost and limited context windowing hinder scalability to long programs. We propose GeoFusion-CAD, an end-to-end diffusion framework for scalable and structure-aware generation. Our proposal encodes CAD programs as hierarchical trees, jointly capturing geometry and topology within a state-space diffusion process. Specifically, a lightweight C-Mamba block models long-range structural dependencies through selective state transitions, enabling coherent generation across extended command sequences. To support long-sequence evaluation, we introduce DeepCAD-240, an extended benchmark that increases the sequence length ranging from 40 to 240 while preserving sketch-extrusion semantics from the ABC dataset. Extensive experiments demonstrate that GeoFusion-CAD achieves superior performance on both short and long command ranges, maintaining high geometric fidelity and topological consistency where Transformer-based models degrade. Our approach sets new state-of-the-art scores for long-sequence parametric CAD generation, establishing a scalable foundation for next-generation CAD modeling systems. Code and datasets are available at GitHub.

Read abstractHide abstract

0

FreeArtGS: Articulated Gaussian Splatting Under Free-moving Scenario

cs.CV cs.GR cs.RO Hang Dai, Hongwei Fan, Han Zhang et al. · Mar 23, 2026

Articulated object reconstruction typically requires either multi-view capture of discrete states or monocular video with a strict static-base-part assumption, limiting practical deployment. FreeArtGS introduces a "free-moving" setting where both joint angles and object poses vary arbitrarily during capture, using only a monocular RGB-D video. The method combines motion-based part segmentation via point tracking priors with joint estimation and 3D Gaussian Splatting optimization to jointly reconstruct geometry, appearance, and articulation.

The increasing demand for augmented reality and robotics is driving the need for articulated object reconstruction with high scalability. However, existing settings for reconstructing from discrete articulation states or casual monocular videos require non-trivial axis alignment or suffer from insufficient coverage, limiting their applicability. In this paper, we introduce FreeArtGS, a novel method for reconstructing articulated objects under free-moving scenario, a new setting with a simple setup and high scalability. FreeArtGS combines free-moving part segmentation with joint estimation and end-to-end optimization, taking only a monocular RGB-D video as input. By optimizing with the priors from off-the-shelf point-tracking and feature models, the free-moving part segmentation module identifies rigid parts from relative motion under unconstrained capture. The joint estimation module calibrates the unified object-to-camera poses and recovers joint type and axis robustly from part segmentation. Finally, 3DGS-based end-to-end optimization is implemented to jointly reconstruct visual textures, geometry, and joint angles of the articulated object. We conduct experiments on two benchmarks and real-world free-moving articulated objects. Experimental results demonstrate that FreeArtGS consistently excels in reconstructing free-moving articulated objects and remains highly competitive in previous reconstruction settings, proving itself a practical and effective solution for realistic asset generation. The project page is available at: https://freeartgs.github.io/

Read abstractHide abstract

0

End-to-End Training for Unified Tokenization and Latent Denoising

cs.CV cs.AI cs.GR Shivam Duggal, Xingjian Bai, Zongze Wu et al. · Mar 23, 2026

Traditional latent diffusion models require staging—first train a VAE tokenizer, freeze it, then train a diffusion model on top. UNITE proposes a single-stage approach where a shared "Generative Encoder" serves as both tokenizer and denoiser via weight sharing, achieving FID 1.73 on ImageNet 256×256 without adversarial losses or pretrained encoders like DINOv2.

Latent diffusion models (LDMs) enable high-fidelity synthesis by operating in learned latent spaces. However, training state-of-the-art LDMs requires complex staging: a tokenizer must be trained first, before the diffusion model can be trained in the frozen latent space. We propose UNITE - an autoencoder architecture for unified tokenization and latent diffusion. UNITE consists of a Generative Encoder that serves as both image tokenizer and latent generator via weight sharing. Our key insight is that tokenization and generation can be viewed as the same latent inference problem under different conditioning regimes: tokenization infers latents from fully observed images, whereas generation infers them from noise together with text or class conditioning. Motivated by this, we introduce a single-stage training procedure that jointly optimizes both tasks via two forward passes through the same Generative Encoder. The shared parameters enable gradients to jointly shape the latent space, encouraging a "common latent language". Across image and molecule modalities, UNITE achieves near state of the art performance without adversarial losses or pretrained encoders (e.g., DINO), reaching FID 2.12 and 1.73 for Base and Large models on ImageNet 256 x 256. We further analyze the Generative Encoder through the lenses of representation alignment and compression. These results show that single stage joint training of tokenization & generation from scratch is feasible.

Read abstractHide abstract

0

Cross-Scenario Deraining Adaptation with Unpaired Data: Superpixel Structural Priors and Multi-Stage Pseudo-Rain Synthesis

cs.CV cs.AI cs.GR Kangbo Zhao, Miaoxin Guan, Xiang Chen et al. · Mar 23, 2026

This paper tackles the domain generalization problem in image deraining, where models trained on synthetic data fail catastrophically on out-of-distribution (OOD) real-world scenarios. The authors propose a three-stage pipeline—Superpixel Generation, Resolution-adaptive Fusion, and Pseudo-label Re-Synthesis—that adapts source-domain models to target domains using only unpaired rain-free images, eliminating the need for costly paired rainy data collection.

Image deraining plays a pivotal role in low-level computer vision, serving as a prerequisite for robust outdoor surveillance and autonomous driving systems. While deep learning paradigms have achieved remarkable success in firmly aligned settings, they often suffer from severe performance degradation when generalized to unseen Out-of-Distribution (OOD) scenarios. This failure stems primarily from the significant domain discrepancy between synthetic training datasets and the complex physical dynamics of real-world rain. To address these challenges, this paper proposes a pioneering cross-scenario deraining adaptation framework. Diverging from conventional approaches, our method obviates the requirements for paired rainy observations in the target domain, leveraging exclusively rain-free background images. We design a Superpixel Generation (Sup-Gen) module to extract stable structural priors from the source domain using Simple Linear Iterative Clustering. Subsequently, a Resolution-adaptive Fusion strategy is introduced to align these source structures with target backgrounds through texture similarity, ensuring the synthesis of diverse and realistic pseudo-data. Finally, we implement a pseudo-label re-Synthesize mechanism that employs multi-stage noise generation to simulate realistic rain streaks. This framework functions as a versatile plug-and-play module capable of seamless integration into arbitrary deraining architectures. Extensive experiments on state-of-the-art models demonstrate that our approach yields remarkable PSNR gains of up to 32% to 59% in OOD domains while significantly accelerating training convergence.

Read abstractHide abstract

0

RefracGS: Novel View Synthesis Through Refractive Water Surfaces with 3D Gaussian Ray Tracing

cs.CV cs.GR Yiming Shao, Qiyu Dai, Chong Gao et al. · Mar 23, 2026

Novel view synthesis (NVS) through non-planar refractive surfaces presents fundamental challenges due to severe, spatially varying optical distortions. While recent representations like NeRF and 3D Gaussian Splatting (3DGS) excel at NVS, their assumption of straight-line ray propagation fails under these conditions, leading to significant artifacts. To overcome this limitation, we introduce RefracGS, a framework that jointly reconstructs the refractive water surface and the scene beneath the interface. Our key insight is to explicitly decouple the refractive boundary from the target objects: the refractive surface is modeled via a neural height field, capturing wave geometry, while the underlying scene is represented as a 3D Gaussian field. We formulate a refraction-aware Gaussian ray tracing approach that accurately computes non-linear ray trajectories using Snell's law and efficiently renders the underlying Gaussian field while backpropagating the loss gradients to the parameterized refractive surface. Through end-to-end joint optimization of both representations, our method ensures high-fidelity NVS and view-consistent surface recovery. Experiments on both synthetic and real-world scenes with complex waves demonstrate that RefracGS outperforms prior refractive methods in visual quality, while achieving 15x faster training and real-time rendering at 200 FPS. The project page for RefracGS is available at https://yimgshao.github.io/refracgs/.

Read abstractHide abstract

Nothing here yet