Nothing here yet
GeoFusion-CAD tackles the scalability bottleneck in parametric CAD generation, where Transformer-based methods struggle with long command sequences due to quadratic attention costs. The authors propose an end-to-end diffusion framework that encodes CAD programs as hierarchical trees and processes them with G-Mamba blocks—geometry-conditioned state-space models that achieve linear complexity $\mathcal{O}(Ld)$ while capturing geometric and topological dependencies. This enables scaling to sequences of up to 240 commands while maintaining high geometric fidelity.
Articulated object reconstruction typically requires either multi-view capture of discrete states or monocular video with a strict static-base-part assumption, limiting practical deployment. FreeArtGS introduces a "free-moving" setting where both joint angles and object poses vary arbitrarily during capture, using only a monocular RGB-D video. The method combines motion-based part segmentation via point tracking priors with joint estimation and 3D Gaussian Splatting optimization to jointly reconstruct geometry, appearance, and articulation.
Traditional latent diffusion models require staging—first train a VAE tokenizer, freeze it, then train a diffusion model on top. UNITE proposes a single-stage approach where a shared "Generative Encoder" serves as both tokenizer and denoiser via weight sharing, achieving FID 1.73 on ImageNet 256×256 without adversarial losses or pretrained encoders like DINOv2.
This paper tackles the domain generalization problem in image deraining, where models trained on synthetic data fail catastrophically on out-of-distribution (OOD) real-world scenarios. The authors propose a three-stage pipeline—Superpixel Generation, Resolution-adaptive Fusion, and Pseudo-label Re-Synthesis—that adapts source-domain models to target domains using only unpaired rain-free images, eliminating the need for costly paired rainy data collection.