SARe: Structure-Aware Large-Scale 3D Fragment Reassembly
3D fragment reassembly becomes challenging at scale because incorrect contact adjacencies trigger cascading failures. This paper proposes SARe, a generative framework that explicitly models contact structure by jointly predicting fracture-surface tokens and inter-fragment adjacency graphs, paired with an inference-time refinement stage that anchors reliable substructures to correct uncertain regions. The work demonstrates state-of-the-art results across synthetic and real fracture datasets, with notable improvements in the many-fragment ($K$) regime.
The paper presents a compelling solution to the scalability bottleneck in 3D fragment reassembly by treating contact structure as a first-class prediction target rather than an implicit byproduct. The core insight—that explicit adjacency modeling prevents error propagation—is well-supported by oracle experiments, and the proposed SARe-Refine mechanism provides a principled way to leverage predicted structure for iterative correction. However, while the method shows strong quantitative gains, the marginal improvements of the refinement stage on the full test set (+0.62% Part Accuracy on Breaking Bad Everyday) suggest that much of the benefit comes from the base generator, and the computational overhead of two-pass inference remains uncharacterized.
The explicit contact-graph prediction is conceptually sound and empirically validated; the ablation in Table 3 shows that removing the adjacency head causes a $6.48$ percentage point drop in Part Accuracy ($88.52\%$ vs $95.00\%$), underscoring its importance for preserving correct inter-fragment relations. The query-point-based conditioning scheme effectively grounds generation in Euclidean space without requiring structural pretraining, and the geometric verification step in SARe-Refine—checking voxel overlap and fracture region coverage—provides a physically meaningful filter for candidate edges that is interpretable and robust.
Several limitations temper the claims. First, SARe-Refine brings little benefit when the first-pass result is already accurate; the gains are concentrated on the $24.98\%$ of samples where the base model already fails ($\mathrm{PA}\leq 95\%$), limiting its utility as a universal post-processor. Second, while the paper claims to "build and release" a new benchmark from OmniObject3D, no URL or repository link appears in the main text (only "see Appendix for details"), making the reproducibility of this dataset contribution unclear. Third, comparisons to PuzzleFusion++ and other baselines rely on reported numbers from different experimental settings (e.g., $K\leq 20$) rather than direct reproduction under the $K\in[2,50]$ protocol used for GARF and RPF, potentially confounding scalability comparisons.
Additionally, the analysis of failure modes remains superficial—attributing failures primarily to "thin, slice-like pieces" without quantitative error analysis or discussion of algorithmic limitations in handling rotational symmetries or repetitive fragment geometries.
The evidence broadly supports the central thesis that contact-structure errors drive scalability failures; Fig. 1 demonstrates that adjacency recall drops with fragment count ($K$) and that oracle ground-truth adjacencies improve assembly quality. The claim of "state-of-the-art performance" is supported by Table 1, where SARe-Gen achieves $94.98\%$ Part Accuracy on Breaking Bad Everyday versus $83.07\%$ for RPF and $81.46\%$ for GARF under the same large-$K$ protocol. However, the paper acknowledges that baseline comparisons for PuzzleFusion++ and SE(3)-Equiv use previously reported results from smaller-$K$ settings, which may not fairly characterize their performance at scale. The internal ablations (Tables 3-5) rigorously isolate the contribution of structural heads and attachment layers, providing strong evidence that intermediate layers ($\ell_s=4$) optimally balance pose generation and structure prediction.
The experimental setup is documented with reasonable detail: the architecture uses a 12-block DiT transformer with frozen ShapeVAE encoding, trained with AdamW (learning rate $1\times 10^{-4}$, $\lambda_F=\lambda_A=0.01$) for 100 epochs, and inference uses Euler sampling with 50 steps and $M=5120$ query points. However, the paper does not explicitly state whether code will be released, and while it mentions releasing the OmniObject3D-derived benchmark, no access information appears in the main text. Critical hyperparameters for SARe-Refine (voxel tolerance thresholds $\tau_o$, minimum component sizes, blending strength $\alpha=0.5$) are deferred to supplementary material. Furthermore, the computational cost of the two-stage pipeline—particularly the voxelization-based geometric verification and the RePaint-style resampling—is not quantified, which would be essential for assessing practical deployment in heritage or robotic applications.
3D fragment reassembly aims to recover the rigid poses of unordered fragment point clouds or meshes in a common object coordinate system to reconstruct the complete shape. The problem becomes particularly challenging as the number of fragments grows, since the target shape is unknown and fragments provide weak semantic cues. Existing end-to-end approaches are prone to cascading failures due to unreliable contact reasoning, most notably inaccurate fragment adjacencies. To address this, we propose Structure-Aware Reassembly (SARe), a generative framework with SARe-Gen for Euclidean-space assembly generation and SARe-Refine for inference-time refinement, with explicit contact modeling. SARe-Gen jointly predicts fracture-surface token probabilities and an inter-fragment contact graph to localize contact regions and infer candidate adjacencies. It adopts a query-point-based conditioning scheme and extracts aligned local geometric tokens at query locations from a frozen geometry encoder, yielding queryable structural representations without additional structural pretraining. We further introduce an inference-time refinement stage, SARe-Refine. By verifying candidate contact edges with geometric-consistency checks, it selects reliable substructures and resamples the remaining uncertain regions while keeping verified parts fixed, leading to more stable and consistent assemblies in the many-fragment regime. We evaluate SARe across three settings, including synthetic fractures, simulated fractures from scanned real objects, and real physically fractured scans. The results demonstrate state-of-the-art performance, with more graceful degradation and higher success rates as the fragment count increases in challenging large-scale reassembly.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.