SARe: Structure-Aware Large-Scale 3D Fragment Reassembly

cs.CV Hanze Jia, Chunshi Wang, Yuxiao Yang, Zhonghua Jiang, Yawei Luo, Shuainan Ye, Tan Tang · Mar 23, 2026
Local to this browser
What it does
3D fragment reassembly becomes challenging at scale because incorrect contact adjacencies trigger cascading failures. This paper proposes SARe, a generative framework that explicitly models contact structure by jointly predicting...
Why it matters
This paper proposes SARe, a generative framework that explicitly models contact structure by jointly predicting fracture-surface tokens and inter-fragment adjacency graphs, paired with an inference-time refinement stage that anchors...
Main concern
The paper presents a compelling solution to the scalability bottleneck in 3D fragment reassembly by treating contact structure as a first-class prediction target rather than an implicit byproduct. The core insight—that explicit adjacency...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

3D fragment reassembly becomes challenging at scale because incorrect contact adjacencies trigger cascading failures. This paper proposes SARe, a generative framework that explicitly models contact structure by jointly predicting fracture-surface tokens and inter-fragment adjacency graphs, paired with an inference-time refinement stage that anchors reliable substructures to correct uncertain regions. The work demonstrates state-of-the-art results across synthetic and real fracture datasets, with notable improvements in the many-fragment ($K$) regime.

Critical review
Verdict
Bottom line

The paper presents a compelling solution to the scalability bottleneck in 3D fragment reassembly by treating contact structure as a first-class prediction target rather than an implicit byproduct. The core insight—that explicit adjacency modeling prevents error propagation—is well-supported by oracle experiments, and the proposed SARe-Refine mechanism provides a principled way to leverage predicted structure for iterative correction. However, while the method shows strong quantitative gains, the marginal improvements of the refinement stage on the full test set (+0.62% Part Accuracy on Breaking Bad Everyday) suggest that much of the benefit comes from the base generator, and the computational overhead of two-pass inference remains uncharacterized.

“Existing end-to-end approaches are prone to cascading failures due to unreliable contact reasoning, most notably inaccurate fragment adjacencies.”
paper · Abstract
“Overall: 94.98 → 95.60 (+0.62 pp) on Everyday”
paper · Table 2
What holds up

The explicit contact-graph prediction is conceptually sound and empirically validated; the ablation in Table 3 shows that removing the adjacency head causes a $6.48$ percentage point drop in Part Accuracy ($88.52\%$ vs $95.00\%$), underscoring its importance for preserving correct inter-fragment relations. The query-point-based conditioning scheme effectively grounds generation in Euclidean space without requiring structural pretraining, and the geometric verification step in SARe-Refine—checking voxel overlap and fracture region coverage—provides a physically meaningful filter for candidate edges that is interpretable and robust.

“w/o-adj. head: PA 88.52%; Ours: PA 95.00%”
paper · Table 3
“Pairs whose fragments interpenetrate—measured by the voxel overlap ratio $r_{ij}=|S_{i}^{v}\cap S_{j}^{v}|/\min(|S_{i}^{v}|,|S_{j}^{v}|)>\tau_{o}$—are discarded; among the rest, only pairs whose fracture regions sufficiently cover each other within a small voxel tolerance are retained.”
paper · Section 4.3
Main concerns

Several limitations temper the claims. First, SARe-Refine brings little benefit when the first-pass result is already accurate; the gains are concentrated on the $24.98\%$ of samples where the base model already fails ($\mathrm{PA}\leq 95\%$), limiting its utility as a universal post-processor. Second, while the paper claims to "build and release" a new benchmark from OmniObject3D, no URL or repository link appears in the main text (only "see Appendix for details"), making the reproducibility of this dataset contribution unclear. Third, comparisons to PuzzleFusion++ and other baselines rely on reported numbers from different experimental settings (e.g., $K\leq 20$) rather than direct reproduction under the $K\in[2,50]$ protocol used for GARF and RPF, potentially confounding scalability comparisons.

Additionally, the analysis of failure modes remains superficial—attributing failures primarily to "thin, slice-like pieces" without quantitative error analysis or discussion of algorithmic limitations in handling rotational symmetries or repetitive fragment geometries.

“SARe-Refine brings little benefit when the first-pass SARe result is already highly accurate... Aggregating the bins with first-pass PA$_{\text{SARe}}\leq 95\%$, this harder subset accounts for 24.98% (Everyday)”
paper · Section 5.2
“We build and release a large-scale fracture reassembly benchmark derived from real-world scanned objects... (see Appendix for details)”
paper · Section 1
“They mostly occur when the object is fragmented into many thin, slice-like pieces, where distinctive geometric cues and stable contact regions are limited.”
paper · Section 5.3
Evidence and comparison

The evidence broadly supports the central thesis that contact-structure errors drive scalability failures; Fig. 1 demonstrates that adjacency recall drops with fragment count ($K$) and that oracle ground-truth adjacencies improve assembly quality. The claim of "state-of-the-art performance" is supported by Table 1, where SARe-Gen achieves $94.98\%$ Part Accuracy on Breaking Bad Everyday versus $83.07\%$ for RPF and $81.46\%$ for GARF under the same large-$K$ protocol. However, the paper acknowledges that baseline comparisons for PuzzleFusion++ and SE(3)-Equiv use previously reported results from smaller-$K$ settings, which may not fairly characterize their performance at scale. The internal ablations (Tables 3-5) rigorously isolate the contribution of structural heads and attachment layers, providing strong evidence that intermediate layers ($\ell_s=4$) optimally balance pose generation and structure prediction.

“As the fragment count $K$ increases, (a) part accuracy drops and (b) adjacency recall of the induced contact graph decreases accordingly... (c) As an oracle diagnostic, re-inference of RPF conditioned on a small set of GT adjacencies improves an example assembly.”
paper · Figure 1
“For the remaining baselines, we report numbers from official papers or repositories, which are reported under smaller-$K$ settings (e.g., $K\leq 20$) and thus serve as reference results.”
paper · Section 5.1
“Layer 4: PA 92.75%, Adj. Prec. 92.94%”
paper · Table 4
Reproducibility

The experimental setup is documented with reasonable detail: the architecture uses a 12-block DiT transformer with frozen ShapeVAE encoding, trained with AdamW (learning rate $1\times 10^{-4}$, $\lambda_F=\lambda_A=0.01$) for 100 epochs, and inference uses Euler sampling with 50 steps and $M=5120$ query points. However, the paper does not explicitly state whether code will be released, and while it mentions releasing the OmniObject3D-derived benchmark, no access information appears in the main text. Critical hyperparameters for SARe-Refine (voxel tolerance thresholds $\tau_o$, minimum component sizes, blending strength $\alpha=0.5$) are deferred to supplementary material. Furthermore, the computational cost of the two-stage pipeline—particularly the voxelization-based geometric verification and the RePaint-style resampling—is not quantified, which would be essential for assessing practical deployment in heritage or robotic applications.

“Our generator is a DiT-style transformer with 12 blocks... We train the model for 100 epochs using AdamW with a learning rate of $1\times 10^{-4}$, and set the loss weights to $\lambda_F=\lambda_A=0.01$... For SARe-Refine, we set the blending strength to $\alpha=0.5$.”
paper · Section 5.1
“We build and release a large-scale fracture reassembly benchmark derived from real-world scanned objects”
paper · Section 1
Abstract

3D fragment reassembly aims to recover the rigid poses of unordered fragment point clouds or meshes in a common object coordinate system to reconstruct the complete shape. The problem becomes particularly challenging as the number of fragments grows, since the target shape is unknown and fragments provide weak semantic cues. Existing end-to-end approaches are prone to cascading failures due to unreliable contact reasoning, most notably inaccurate fragment adjacencies. To address this, we propose Structure-Aware Reassembly (SARe), a generative framework with SARe-Gen for Euclidean-space assembly generation and SARe-Refine for inference-time refinement, with explicit contact modeling. SARe-Gen jointly predicts fracture-surface token probabilities and an inter-fragment contact graph to localize contact regions and infer candidate adjacencies. It adopts a query-point-based conditioning scheme and extracts aligned local geometric tokens at query locations from a frozen geometry encoder, yielding queryable structural representations without additional structural pretraining. We further introduce an inference-time refinement stage, SARe-Refine. By verifying candidate contact edges with geometric-consistency checks, it selects reliable substructures and resamples the remaining uncertain regions while keeping verified parts fixed, leading to more stable and consistent assemblies in the many-fragment regime. We evaluate SARe across three settings, including synthetic fractures, simulated fractures from scanned real objects, and real physically fractured scans. The results demonstrate state-of-the-art performance, with more graceful degradation and higher success rates as the fragment count increases in challenging large-scale reassembly.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.