GeoFusion-CAD: Structure-Aware Diffusion with Geometric State Space for Parametric 3D Design

cs.CV cs.GR Xiaolei Zhou, Chuangjie Fang, Jie Wu, Jingyi Yang, Boyi Lin, Jianwei Zheng · Mar 23, 2026
Local to this browser
What it does
GeoFusion-CAD tackles the scalability bottleneck in parametric CAD generation, where Transformer-based methods struggle with long command sequences due to quadratic attention costs. The authors propose an end-to-end diffusion framework...
Why it matters
The authors propose an end-to-end diffusion framework that encodes CAD programs as hierarchical trees and processes them with G-Mamba blocks—geometry-conditioned state-space models that achieve linear complexity $\mathcal{O}(Ld)$ while...
Main concern
The paper presents a technically sound and well-motivated approach to long-sequence CAD generation. The integration of hierarchical tree representations with state-space diffusion models is novel, and the efficiency gains are empirically...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

GeoFusion-CAD tackles the scalability bottleneck in parametric CAD generation, where Transformer-based methods struggle with long command sequences due to quadratic attention costs. The authors propose an end-to-end diffusion framework that encodes CAD programs as hierarchical trees and processes them with G-Mamba blocks—geometry-conditioned state-space models that achieve linear complexity $\mathcal{O}(Ld)$ while capturing geometric and topological dependencies. This enables scaling to sequences of up to 240 commands while maintaining high geometric fidelity.

Critical review
Verdict
Bottom line

The paper presents a technically sound and well-motivated approach to long-sequence CAD generation. The integration of hierarchical tree representations with state-space diffusion models is novel, and the efficiency gains are empirically validated on the proposed DeepCAD-240 benchmark. However, the evaluation is limited by the scarcity of truly long sequences in the dataset—only 0.21% exceed 160 commands—and some architectural details remain underspecified in the main text.

“160–240 ... 0.21”
Table S2 · Appendix A.5
What holds up

The hierarchical tree representation effectively captures geometric dependencies, with ablation studies confirming its necessity: removing it causes command accuracy to drop from 91.2 to 87.5 and COV to decrease by 4.5 points. The G-Mamba block achieves substantial efficiency improvements, reducing memory usage by approximately 50% compared to HNC-CAD (5198MiB vs 10342MiB) while improving long-sequence accuracy by 8.4%. The linear complexity claim is valid, as G-Mamba preserves the $\mathcal{O}(Ld)$ scaling of vanilla Mamba while introducing geometry-conditioned transitions.

“Removing the hierarchical encoding (w/o Tree) leads to a noticeable decline in performance, with command accuracy dropping from 91.2 to 87.5”
paper · Section 5.3
“G-Mamba preserves the same asymptotic complexity as vanilla Mamba: $T_{\text{G-Mamba}}(L,d)=\mathcal{O}(Ld)$”
paper · Appendix B.6.1
Main concerns

The DeepCAD-240 dataset, while extending the maximum sequence length to 240, remains heavily skewed toward shorter sequences—76.6% have fewer than 40 commands—limiting validation of scalability claims for truly long programs. Additionally, the geometric conditioning mechanism relies on hand-designed features (scale, depth, curvature) whose sensitivity to noise is not analyzed. The improvement over vanilla Mamba is marginal (91.2 vs 89.2 command accuracy), suggesting limited benefit from the geometric conditioning overhead. The paper also lacks comparison against other recent subquadratic architectures beyond Mamba.

“76.6 ... 0.21”
Table S2 · Appendix A.5
“Vanilla Mamba ... 89.2 ... Ours ... 91.2”
Table 2 · Section 5.3
Evidence and comparison

Quantitative results demonstrate clear advantages over DeepCAD, SkexGen, and HNC-CAD on the new benchmark, particularly for long sequences (40–240 commands), with improvements in COV (73.9 vs 71.2), MMD (1.12 vs 1.71), and JSD (2.97 vs 3.81). However, the baselines may not be optimally tuned for the extended sequence lengths, and the comparison does not include recent state-space alternatives. The visual comparisons (Fig. 5) show improved geometric coherence, though the authors acknowledge "minor boundary irregularities occasionally appear."

“GeoFusion-CAD attains 91.2 command accuracy and 73.9 COV, outperforming HNC-CAD by 8.4% and 2.7%, respectively”
paper · Section 5.2
“minor boundary irregularities occasionally appear”
paper · Section 5.2
Reproducibility

The authors commit to releasing code and datasets, and the supplementary material provides detailed tokenization schemes (Table S1) and architectural specifications. However, critical implementation details—such as the exact MLP architecture for $f_{\text{geom}}$, the discretization schedule for diffusion timesteps, and the specific curvature computation for $r_k$—are either omitted or relegated to supplementary material without reference in the main text. Training requires 1000 epochs on a single RTX 3090, which is feasible, though total training time is not reported. The loss coefficient $\eta=2$ is specified, but the sensitivity to this hyperparameter is not discussed.

“The number of MLP layers in the CAD decoder is set to 3 ... trained with a batch size of 512 for a total of 1000 epochs”
paper · Appendix C.1
“The coefficient $\eta$ balances parameter supervision”
paper · Section 4.5
Abstract

Parametric Computer-Aided Design (CAD) is fundamental to modern 3D modeling, yet existing methods struggle to generate long command sequences, especially under complex geometric and topological dependencies. Transformer-based architectures dominate CAD sequence generation due to their strong dependency modeling, but their quadratic attention cost and limited context windowing hinder scalability to long programs. We propose GeoFusion-CAD, an end-to-end diffusion framework for scalable and structure-aware generation. Our proposal encodes CAD programs as hierarchical trees, jointly capturing geometry and topology within a state-space diffusion process. Specifically, a lightweight C-Mamba block models long-range structural dependencies through selective state transitions, enabling coherent generation across extended command sequences. To support long-sequence evaluation, we introduce DeepCAD-240, an extended benchmark that increases the sequence length ranging from 40 to 240 while preserving sketch-extrusion semantics from the ABC dataset. Extensive experiments demonstrate that GeoFusion-CAD achieves superior performance on both short and long command ranges, maintaining high geometric fidelity and topological consistency where Transformer-based models degrade. Our approach sets new state-of-the-art scores for long-sequence parametric CAD generation, establishing a scalable foundation for next-generation CAD modeling systems. Code and datasets are available at GitHub.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.