SparseDVFS: Sparse-Aware DVFS for Energy-Efficient Edge Inference

cs.LG Ziyang Zhang, Zheshun Wu, Jie Liu, Luca Mottola · Mar 23, 2026
Local to this browser
What it does
SparseDVFS tackles energy-efficient DNN inference on edge devices by bridging the gap between coarse model-level and prohibitive operator-level DVFS. The core insight is using operator sparsity to distinguish compute-bound and memory-bound...
Why it matters
The core insight is using operator sparsity to distinguish compute-bound and memory-bound phases, applying specialized frequency triplets via a block-level strategy. A white-box offline modeler, greedy graph partitioner with amortization...
Main concern
The paper presents a technically sound, three-tier solution to the granularity-overhead trade-off in DVFS for edge inference. The integration of an offline physics-based modeler, runtime greedy partitioner with latency amortization...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

SparseDVFS tackles energy-efficient DNN inference on edge devices by bridging the gap between coarse model-level and prohibitive operator-level DVFS. The core insight is using operator sparsity to distinguish compute-bound and memory-bound phases, applying specialized frequency triplets via a block-level strategy. A white-box offline modeler, greedy graph partitioner with amortization constraints, and unified co-governor with look-ahead pipelining collectively achieve substantial energy savings while managing switching overheads.

Critical review
Verdict
Bottom line

The paper presents a technically sound, three-tier solution to the granularity-overhead trade-off in DVFS for edge inference. The integration of an offline physics-based modeler, runtime greedy partitioner with latency amortization constraints, and unified co-governor effectively addresses the antagonistic effects between CPU/GPU controllers. The claimed 78.17% energy efficiency improvement over state-of-the-art is substantiated on real hardware (NVIDIA Jetson Orin Nano), though the evaluation scope is limited to a single platform and a narrow set of vision models.

“SparseDVFS achieves an average 78.17% energy efficiency gain over state-of-the-art solutions while maintaining a superior 14% cost-gain ratio”
SparseDVFS paper · Abstract
“frequency switching latencies can exceed the execution time of lightweight operators, causing transition penalties to dominate the timeline and negate energy savings”
SparseDVFS paper · Section 2.4
What holds up

The motivation and characterization of DVFS switching overheads is rigorous. The authors empirically demonstrate that GPU frequency switching latency ranges from 5ms to 10ms (exceeding 20ms at low frequencies), which far surpasses the execution time of lightweight operators like ReLU and GELU. The Roofline model analysis effectively bifurcates operators into compute-bound (Conv2d) and memory-bound (activation/normalization) categories, providing a principled foundation for the sparsity-aware approach. The latency amortization constraint $T_{block} > N \times T_{switch}$ is a clean heuristic that successfully reduces switching overhead by $7.0\times$ for ResNet-18 and $8.5\times$ for ViT-B16.

Main concerns

The offline modeler's prediction accuracy (80.8-86.2% within $\pm 10\%$) introduces uncertainty that could violate strict real-time constraints in safety-critical applications, and the paper does not quantify the impact of mispredictions on end-to-end latency or energy. The aggregation factor $N=5$ appears empirically tuned without theoretical justification for why this specific value optimizes the Pareto frontier, and Figure 18 suggests sensitivity to this hyperparameter. While claiming architectural adaptability, the evaluation is limited to ResNet and ViT on a single Jetson platform, leaving generalization to diverse sparsity patterns (structured vs. unstructured), dynamic shapes, or non-NVIDIA hardware unverified. The white-box modeler requires device-specific profiling that must be repeated for new hardware, contradicting the scalability claims to some extent.

“the deterministic predictor achieves $\pm 10\%$ accuracies of 86.25\%, 84.6\%, 82.1\%, and 80.8\% for ResNet-18, ResNet-101, ViT-B16, and ViT-L16”
SparseDVFS paper · Table 1
Evidence and comparison

The evidence supports the core claim that block-level granularity outperforms both model-level (GearDVFS) and operator-level (Ascend-DVFS) approaches, with the 14% cost-gain ratio effectively quantifying the efficiency trade-off. The ablation study in Section 6.6 isolates the contribution of the unified co-governor using DOTA-v1.0 and VisDrone datasets, though it is unclear why these differ from the ImageNet-2012 used in the main evaluation. The comparison to Ascend-DVFS may not account for kernel fusion optimizations that could mitigate operator-level switching overheads. The thermal throttling analysis (Section 6.7) provides compelling evidence of stability benefits, but lacks statistical measures of variance across multiple devices or ambient conditions.

“Ablation studies of each component of the unified co-governor on (a) DOTA-v1.0 and (b) VisDrone 2019 datasets, respectively”
SparseDVFS paper · Section 6.6
Reproducibility

The implementation description is reasonably detailed, specifying PyTorch 2.1, TensorRT execution queues, and the jetson\_stats API for power monitoring. However, the paper contains no code repository link or artifact availability statement, which blocks independent reproduction of the super-block partitioning and frequency modulation results. The thermal-aware power model coefficients ($k_1, k_2$) and peak performance values ($\mathcal{P}_{peak}$, $\mathcal{B}_{mem}$) are device-specific constants derived from offline regression that are not provided, forcing reproduction efforts to reverse-engineer these parameters. The look-ahead pipeline mechanism relies on precise knowledge of $T_{switch}$, which is hardware-specific and measured via undisclosed benchmarks.

“SparseDVFS is implemented as a lightweight middleware... The offline modeler and the runtime graph partitioner are implemented in Python... The unified co-governor is implemented in C++”
SparseDVFS paper · Section 5
“The coefficients $k_1$ and $k_2$ are device-specific thermal constants derived from regression analysis of offline profiling data”
SparseDVFS paper · Section 4.1.2
Abstract

Deploying deep neural networks (DNNs) on power-sensitive edge devices presents a formidable challenge. While Dynamic Voltage and Frequency Scaling (DVFS) is widely employed for energy optimization, traditional model-level scaling is often too coarse to capture intra-inference variations, whereas fine-grained operator-level scaling suffers from prohibitive performance degradation due to significant hardware switching latency. This paper presents SparseDVFS, a fine-grained, sparse-aware DVFS framework designed for energy-efficient edge inference. Our key insight is that operator sparsity is a primary metric for hardware frequency modulation. By distinguishing between compute-bound dense operators and memory-bound sparse operators, the system can apply specialized frequency triplets to maximize energy efficiency. To overcome switching overheads and component interference, SparseDVFS incorporates three key innovations: (1) an offline modeler that established a deterministic mapping between operator sparsity and optimal frequency triplets (CPU/GPU/EMC) via white-box timeline analysis; (2) a runtime graph partitioner that utilizes a greedy merging heuristic to aggregate operators into super-blocks, balancing scaling granularity and DVFS switching latency through a latency amortization constraint; and (3) a unified co-governor that employs a frequency unified scaling engine (FUSE) and a look-ahead instruction queue to eliminate antagonistic effects between independent controllers and hide hardware transition latencies. Extensive evaluations show that SparseDVFS achieves an average 78.17% energy efficiency gain over state-of-the-art solutions while maintaining a superior 14% cost-gain ratio.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.