A Large-Scale Remote Sensing Dataset and VLM-based Algorithm for Fine-Grained Road Hierarchy Classification
Most road extraction benchmarks focus on binary segmentation, lacking the hierarchical attributes critical for transport infrastructure planning and management. This paper introduces SYSU-HiRoads, a large-scale dataset spanning 3,631 km² with aligned pixel masks, vector centerlines, and three-level road grades, alongside RoadReasoner—a framework that combines frequency-domain feature extraction with vision-language models to infer road hierarchy from geometric descriptors. The work bridges a significant gap in automated mapping by moving beyond "where are the roads" to "what roles do these roads play."
The paper offers a solid contribution to remote sensing infrastructure mapping through its comprehensive dataset and the novel application of VLMs to hierarchical road classification. The SYSU-HiRoads dataset is well-constructed with rigorous annotation protocols, and the ablation studies convincingly validate the proposed FORCE-Net modules. However, the reliance on proprietary GPT-v API calls for the T-HRN component introduces significant reproducibility risks, and the hierarchy classification performance (60.6% SegAcc) suggests the problem remains challenging. The claim to be "the first large-scale hierarchical road benchmark over Chinese cities" is reasonable given the specific combination of pixel/vector annotations and grade labels, though the geographic coverage is limited to Henan Province.
The dataset construction methodology is rigorous, combining Chinese administrative standards with expert review to create aligned pixel and vector annotations. The ablation studies (Table 4) provide clear evidence that the FDE and PMSE modules offer complementary benefits: jointly integrating both improves IoU by 6.37% and F1 by 5.71% over the baseline. The geometric descriptor design (Table 5) thoughtfully encodes scale ($L$, $W$), shape ($S$, $C$), and network context ($D$, $\rho$) cues that correlate with functional road class. The comparison of VLM backbones (Table 7) is comprehensive, identifying DINOv2-ViT-B as superior to CLIP variants for grade discrimination, particularly on low-grade roads.
The framework's dependence on GPT-v for grade prediction creates a reproducibility bottleneck and potential instability due to API version changes or prompt sensitivity. The hierarchy classification accuracy (F1 64.2%, SegAcc 60.6%) remains moderate, suggesting that geometric descriptors alone may be insufficient for fine-grained discrimination without additional topological or land-use context. The authors acknowledge that "road hierarchy is inherently context-dependent" and that administrative classifications do not always align with functional roles, which introduces label ambiguity that the model does not explicitly handle. Furthermore, the evaluation on CHN6-CUG only tests binary extraction, not hierarchy classification, leaving the generalization of T-HRN largely unvalidated outside SYSU-HiRoads.
The road extraction results on CHN6-CUG demonstrate competitive performance (IoU 51.81%, F1 63.89%), surpassing RCFSNet by 5.02% IoU, though this benchmark lacks hierarchy labels and thus cannot evaluate the paper's core contribution. The ablation study in Table 6 shows that incorporating the CLIP-based VLM prior yields the most substantial gain for hierarchy classification (OA 66.8% $\rightarrow$ 71.9%), but the paper omits comparisons against non-VLM alternatives such as Random Forest or MLP classifiers using identical geometric features. Without these baselines, it is difficult to isolate whether the performance gains stem from language pre-training or merely from the multi-modal fusion architecture. The statement that RoadReasoner "surpasses state-of-the-art road extraction baselines" holds for binary masks but remains unverified for the hierarchical task due to the lack of competing methods.
The SYSU-HiRoads dataset is publicly released via Zenodo with a DOI, supporting dataset reproducibility. Training details for FORCE-Net are reasonably complete, specifying batch size 2, Adam optimizer, initial learning rate $2\times 10^{-4}$, and step decay. However, the T-HRN component relies on GPT-v without disclosure of prompt templates, temperature settings, or API versioning, creating a significant barrier to independent reproduction. The geometric discretization thresholds for converting continuous measurements into textual categories (e.g., "short/medium/long") are not specified, nor are the exact weights $w_1=0.2$, $w_2=0.8$ in the endpoint matching degree formula $D_i$ adequately justified. While the authors state that "the dataset and code will be publicly released," the dependency on proprietary LLM APIs means full reproduction requires ongoing commercial service availability.
In this work, we present SYSU-HiRoads, a large-scale hierarchical road dataset, and RoadReasoner, a vision-language-geometry framework for automatic multi-grade road mapping from remote sensing imagery. SYSU-HiRoads is built from GF-2 imagery covering 3631 km2 in Henan Province, China, and contains 1079 image tiles at 0.8 m spatial resolution. Each tile is annotated with dense road masks, vectorized centerlines, and three-level hierarchy labels, enabling the joint training and evaluation of segmentation, topology reconstruction, and hierarchy classification. Building on this dataset, RoadReasoner is designed to generate robust road surface masks, topology-preserving road networks, and semantically coherent hierarchy assignments. We strengthen road feature representation and network connectivity by explicitly enhancing frequency-sensitive cues and multi-scale context. Moreover, we perform hierarchy inference at the skeleton-segment level with geometric descriptors and geometry-aware textual prompts, queried by vision-language models to obtain linguistically interpretable grade decisions. Experiments on SYSU-HiRoads and the CHN6-CUG dataset show that RoadReasoner surpasses state-of-the-art road extraction baselines and produces accurate and semantically consistent road hierarchy maps with 72.6% OA, 64.2% F1 score, and 60.6% SegAcc. The dataset and code will be publicly released to support automated transport infrastructure mapping, road inventory updating, and broader infrastructure management applications.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.