Riverine Land Cover Mapping through Semantic Segmentation of Multispectral Point Clouds

cs.CV Sopitta Thurachen, Josef Taher, Matti Lehtom\"aki, Leena Matikainen, Linnea Bl{\aa}field, Mikel Calle Navarro, Antero Kukko, Tomi Westerlund, Harri Kaartinen · Mar 23, 2026
Local to this browser
What it does
Accurate riverine land cover mapping is essential for river management but challenging due to water penetration issues in 2D imagery and complex 3D structure. This paper applies Point Transformer v2 (PTv2)—using grouped vector attention...
Why it matters
The authors demonstrate that spectral features (particularly intensity and reflectance) combined with geometric data achieve $0. 950$ mean IoU, and propose multi-dataset training with sparse annotations to improve cross-site generalization...
Main concern
The study presents a rigorous evaluation of PTv2 on a novel multispectral riverine dataset from the Oulanka River, achieving strong quantitative results ($\text{mIoU}=0. 950$) that substantially exceed Random Forest ($0.
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Accurate riverine land cover mapping is essential for river management but challenging due to water penetration issues in 2D imagery and complex 3D structure. This paper applies Point Transformer v2 (PTv2)—using grouped vector attention and partition-based pooling—to multispectral LiDAR point clouds (1550 nm, 905 nm, 532 nm) for semantic segmentation of six land cover classes in Finnish river environments. The authors demonstrate that spectral features (particularly intensity and reflectance) combined with geometric data achieve $0.950$ mean IoU, and propose multi-dataset training with sparse annotations to improve cross-site generalization despite severe class imbalance.

Critical review
Verdict
Bottom line

The study presents a rigorous evaluation of PTv2 on a novel multispectral riverine dataset from the Oulanka River, achieving strong quantitative results ($\text{mIoU}=0.950$) that substantially exceed Random Forest ($0.682$), DGCNN ($0.744$), and RandLA-Net ($0.580$) baselines. The ablation studies are particularly thorough, isolating the contributions of individual LiDAR features and demonstrating that intensity and reflectance are the dominant spectral predictors for sediment classification. However, the generalization claims are limited by the restricted geographic scope—all data derives from a single river system in northern Finland, leaving performance on tropical, arid, or agricultural riparian zones unverified. Additionally, while the paper mentions sediment transport monitoring as an application, the experiments only demonstrate static classification at a single time point rather than temporal change detection.

“PTv2 achieved the highest mIoU with a score of 0.950, outperforming the second-best model (DGCNN at 0.744) by 0.206 points”
Thurachen et al., Table 13 · Section 5.4
“Sand and gravel mixtures in sediment bars or bank areas... can be challenging for the model to map for multiple reasons”
Thurachen et al., Section 5.5.3 · Section 5.5.3
What holds up

The multi-dataset training strategy effectively addresses the severe class imbalance between sites—where sand comprises only $0.5\%$ of the Nurmisaari (NS) dataset but $32.3\%$ of Honkaniemi (HN)—resulting in a dramatic improvement in generalization to the held-out Jäkälämutka (JM) test site ($\text{mIoU}$ increases from $0.773$ to $0.950$). The ablation studies provide clear evidence that spectral information is essential for sediment discrimination: the geometry-only baseline achieves IoU scores of merely $0.487$ for sand and $0.194$ for gravel, while adding Channel 1 (1550 nm) increases these to $0.849$ and $0.908$ respectively. The claim that intensity and reflectance features are the key drivers is well-supported by individual feature tests showing these achieve $\text{mIoU}>0.93$, compared to $<0.77$ for amplitude and deviation.

“The model trained only on the NS dataset achieved an mIoU of 0.773. However, introducing HN data into the training process improved the mIoU significantly to 0.950”
Thurachen et al., Table 12 · Section 5.3
“The baseline configuration using only 3D coordinates achieved modest performance (mIoU: 0.643, mAcc: 0.751). When including Channel 1 resulted in significant improvement in the performance (mIoU: 0.865, mAcc: 0.932)”
Thurachen et al., Table 8 · Section 5.1
“The use of intensity and reflectance demonstrated the highest mIoU of 0.937 and 0.934, respectively... In contrast, the use of amplitude and deviation resulted in significantly lower performance, with mIoU of 0.767 and 0.719”
Thurachen et al., Table 11 · Section 5.2
Main concerns

The primary limitation is the narrow geographic and environmental diversity: all three study sites belong to the same river system with similar climatic and sedimentological contexts, so the "generalization" demonstrated may simply reflect consistency within a single fluvial environment rather than robustness to true domain shifts. The annotation methodology relies on visual interpretation of orthophotos, which introduces subjectivity in transitional zones such as sand-gravel mixtures and shallow water over gravel—areas where the model indeed shows confusion and misclassification. Furthermore, the paper claims potential for "monitoring sediment transport" but provides no temporal data or change detection experiments; the dataset represents a single acquisition date (September 6, 2022). Computational practicalities are also underexplored—the authors note the pipeline is "time-consuming and computationally demanding" but provide no metrics for inference time, memory usage, or scalability to larger catchments.

“The sediment sorting creates gradual transitions rather than sharp boundaries, making it harder for the model to predict well-defined class edges”
Thurachen et al., Section 5.5.3 · Section 5.5.3
“The point cloud processing pipeline is often time-consuming and computationally demanding due to the complexity and the extensive number of parameters typical in transformer-based architectures”
Thurachen et al., Conclusion · Section 6
Evidence and comparison

The comparison against baselines is fair and well-documented, though RandLA-Net's unusually poor performance ($\text{mIoU}=0.580$, with low vegetation at $0.07$) suggests potential implementation or hyperparameter issues rather than inherent architectural inadequacy for this task. The spectral ablation studies are methodically sound, clearly demonstrating diminishing returns: adding Channel 1 improves $\text{mIoU}$ by $0.222$, while Channels 2-3 together add only $0.003$. However, the paper does not compare against more recent point cloud architectures (e.g., Point Transformer V3, Stratified Transformer) that might offer superior efficiency or accuracy. The spectral feature analysis uniquely identifies that amplitude and deviation provide limited value for riverine mapping, offering practical guidance for future sensor configurations.

“RandLA-Net... 0.580... Low veg. 0.07”
Thurachen et al., Table 13 · Section 5.4
“Incorporating both Channel 1 and Channel 2 further improved the performance (mIoU: 0.937, mAcc: 0.970). When combining Channel 1, Channel 2, and Channel 3, we observed only marginal gain in the result (mIoU: 0.938, mAcc: 0.971)”
Thurachen et al., Table 8 · Section 5.1
Reproducibility

While the preprocessing pipeline is documented in detail—including voxel downsampling to $2\,\text{cm}$, Statistical Outlier Removal ($k=10$), and Cloth Simulation Filter for ground segmentation—critical implementation details necessary for reproduction are missing. The authors do not release code or the annotated dataset, and key hyperparameters such as the specific learning rate schedule steps, the exact composition of mini-batches from the two training datasets (NS and HN), and data augmentation probabilities are unspecified. The sparse annotation strategy for HN ("only selected regions were manually labeled") lacks quantitative description of selection criteria or spatial distribution. Without access to the HeliALS-TW system (a custom Finnish Geospatial Research Institute platform), exact replication is impossible for external researchers, though the method should generalize to standard multispectral LiDAR data.

“We use Adaptive Moment Estimation with Decoupled Weight Decay Regularization (AdamW) with a learning rate of 0.001 and a weight decay of 0.075, using a cosine annealing schedule”
Thurachen et al., Section 4.2 · Section 4.2
“HN dataset was annotated using a sparse labeling approach, where only selected regions were manually labeled to reduce annotation costs while maintaining class diversity”
Thurachen et al., Section 3.5 · Section 3.5
Abstract

Accurate land cover mapping in riverine environments is essential for effective river management, ecological understanding, and geomorphic change monitoring. This study explores the use of Point Transformer v2 (PTv2), an advanced deep neural network architecture designed for point cloud data, for land cover mapping through semantic segmentation of multispectral LiDAR data in real-world riverine environments. We utilize the geometric and spectral information from the 3-channel LiDAR point cloud to map land cover classes, including sand, gravel, low vegetation, high vegetation, forest floor, and water. The PTv2 model was trained and evaluated on point cloud data from the Oulanka river in northern Finland using both geometry and spectral features. To improve the model's generalization in new riverine environments, we additionally investigate multi-dataset training that adds sparsely annotated data from an additional river dataset. Results demonstrated that using the full-feature configuration resulted in performance with a mean Intersection over Union (mIoU) of 0.950, significantly outperforming the geometry baseline. Other ablation studies revealed that intensity and reflectance features were the key for accurate land cover mapping. The multi-dataset training experiment showed improved generalization performance, suggesting potential for developing more robust models despite limited high-quality annotated data. Our work demonstrates the potential of applying transformer-based architectures to multispectral point clouds in riverine environments. The approach offers new capabilities for monitoring sediment transport and other river management applications.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.