The Average Relative Entropy and Transpilation Depth determines the noise robustness in Variational Quantum Classifiers
Variational Quantum Classifiers (VQAs) are typically trained in ideal classical simulations, raising concerns about reproducibility on noisy quantum hardware. This paper proposes that the average relative entropy between class distributions combined with transpilation depth predicts noise robustness—introducing the log-DTSAE metric to forecast accuracy degradation without requiring noisy hardware execution. The authors validate this across thousands of models spanning diverse ansatzes, encodings, and simulated backends from IBM, IQM, and IonQ.
The paper establishes a compelling empirical correlation between relative entropy, circuit depth, and noise-induced accuracy degradation. However, the titular claim that these factors "determine" noise robustness overstates the case—it establishes predictive correlation via Gaussian Process Regression rather than causal determinism. While offering a practical heuristic for circuit assessment, the theoretical grounding for the specific metric form remains limited; the paper cites De Palma et al.'s result that "the relative entropy decreases as it passes through a noisy channel" but does not derive why the logarithmic or square-root transformations optimally capture this relationship.
The extensive empirical evaluation is the paper's primary strength, covering over 8,500 trained models across 20+ ansatz structures, five dataset types, and multiple noisy backends. The central observation—that depth alone fails to predict noise resilience—is convincingly demonstrated: as noted in Section VI, "a circuit with a depth of 2033 showed a higher accuracy difference of 0.4... while another circuit with a depth of 2236 exhibited a much smaller accuracy difference of just 0.029." The proposed log-DTSAE metric ($\log(\text{depth}/\sqrt{D_{avg}})$) effectively stratifies model performance, and the authors honestly report negative results, noting that "No model with low accuracy difference... was observed for Amplitude encoding."
The study relies predominantly on simulated "fake" backends, with real hardware validation limited to constrained experiments on IBM and IQM devices due to cost barriers. The Average Relative Entropy $D_{avg}(\rho||\sigma) = \frac{1}{n+m}\sum_{i,j} D(U\rho_i U^\dagger||U\sigma_j U^\dagger)$ requires $\mathcal{O}(nm)$ pairwise quantum state tomography or simulation costs, becoming prohibitive for large test sets. Furthermore, while claiming evaluation "without explicit executions on quantum devices," the method requires prior knowledge of the specific device's transpilation depth and gate set, which may not be available a priori. The threshold values (e.g., log-DTSAE < 3.5) appear empirically fitted to simulated noise models rather than theoretically derived, raising questions about their validity under real hardware temporal drift.
The evidence robustly supports the claim that combining entropy with depth improves prediction over either factor individually, as shown in Table I where models with similar depths diverge significantly based on entropy values. The comparison to prior work is generally fair, particularly challenging Sharma et al.'s claims of depth-independent resilience with the observation that "circuit depth alone is insufficient to characterize shallow circuits." However, the contradiction of Ahmed et al.'s probability distribution hypothesis lacks quantitative statistical analysis. The diversity of experimental controls—20+ ansatzes, multiple optimizers (Adam, AdaGrad, etc.), and five distinct dataset generators—strengthens generalizability claims, though the >85% accuracy threshold for model selection may introduce survivorship bias toward well-behaved circuits.
The authors provide GitHub repositories for training, testing, and analysis code with specific commit references from March 2026. Experimental components are well-documented, including standard dataset generators (Make Classification, Hidden Manifold, Hyperplanes, Two Curves) and explicit ansatz definitions (QNN pooling, Data Re-upload). However, specific training hyperparameters (learning rates, batch sizes, convergence thresholds) are described only generally as "varying" rather than tabulated exhaustively. The transpilation depth calculations depend on Qiskit's transpiler version and optimization settings (not explicitly specified), which significantly affect quantum circuit depth. While the code appears complete, the computational requirements (8,500+ models trained on the Puhti Supercomputer) present a substantial barrier to independent verification. Limited validation on actual quantum devices (Fez, Torino, Marrakesh, Emerald) versus extensive simulations leaves open questions about threshold validity under real temporal noise fluctuations.
Variational Quantum Algorithms (VQAs) have been extensively researched for applications in Quantum Machine Learning (QML), Optimization, and Molecular simulations. Although designed for Noisy Intermediate-Scale Quantum (NISQ) devices, VQAs are predominantly evaluated classically due to uncertain results on noisy devices and limited resource availability. Raising concern over the reproducibility of simulated VQAs on noisy hardware. While prior studies indicate that VQAs may exhibit noise resilience in specific parameterized shallow quantum circuits, there are no definitive measures to establish what defines a shallow circuit or the optimal circuit depth for VQAs on a noisy platform. These challenges extend naturally to Variational Quantum Classification (VQC) algorithms, a subclass of VQAs for supervised learning. In this article, we propose a relative entropy-based metric to verify whether a VQC model would perform similarly on a noisy device as it does on simulations. We establish a strong correlation between the average relative entropy difference in classes, transpilation circuit depth, and their performance difference on a noisy quantum device. Our results further indicate that circuit depth alone is insufficient to characterize shallow circuits. We present empirical evidence to support these assertions across a diverse array of techniques for implementing VQC, datasets, and multiple noisy quantum devices.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.