The Average Relative Entropy and Transpilation Depth determines the noise robustness in Variational Quantum Classifiers

quant-ph cs.LG Aakash Ravindra Shinde, Arianne Meijer - van de Griend, Jukka K. Nurminen · Mar 22, 2026

What it does

Why it matters

This paper proposes that the average relative entropy between class distributions combined with transpilation depth predicts noise robustness—introducing the log-DTSAE metric to forecast accuracy degradation without requiring noisy...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Variational Quantum Classifiers (VQAs) are typically trained in ideal classical simulations, raising concerns about reproducibility on noisy quantum hardware. This paper proposes that the average relative entropy between class distributions combined with transpilation depth predicts noise robustness—introducing the log-DTSAE metric to forecast accuracy degradation without requiring noisy hardware execution. The authors validate this across thousands of models spanning diverse ansatzes, encodings, and simulated backends from IBM, IQM, and IonQ.

Critical review

Verdict

Bottom line

The paper establishes a compelling empirical correlation between relative entropy, circuit depth, and noise-induced accuracy degradation. However, the titular claim that these factors "determine" noise robustness overstates the case—it establishes predictive correlation via Gaussian Process Regression rather than causal determinism. While offering a practical heuristic for circuit assessment, the theoretical grounding for the specific metric form remains limited; the paper cites De Palma et al.'s result that "the relative entropy decreases as it passes through a noisy channel" but does not derive why the logarithmic or square-root transformations optimally capture this relationship.

“the relative entropy decreases as it passes through a noisy channel”

De Palma et al., 2023 · Sec. II

“Values shown in green indicate that the accuracy of the noisy device differs from the accuracy of the simulation by less than 0.03”

This paper · Table I caption

What holds up

The extensive empirical evaluation is the paper's primary strength, covering over 8,500 trained models across 20+ ansatz structures, five dataset types, and multiple noisy backends. The central observation—that depth alone fails to predict noise resilience—is convincingly demonstrated: as noted in Section VI, "a circuit with a depth of 2033 showed a higher accuracy difference of 0.4... while another circuit with a depth of 2236 exhibited a much smaller accuracy difference of just 0.029." The proposed log-DTSAE metric ($\log(\text{depth}/\sqrt{D_{avg}})$) effectively stratifies model performance, and the authors honestly report negative results, noting that "No model with low accuracy difference... was observed for Amplitude encoding."

“a circuit with a depth of 2033 showed a higher accuracy difference of 0.4 between the simulation and the noisy device, while another circuit with a depth of 2236 exhibited a much smaller accuracy difference of just 0.029”

This paper · Section VI

“No model with low accuracy difference between simulated and noisy devices was observed for Amplitude encoding, as the transpilation depth for this method was notably high”

This paper · Section VI

Main concerns

The study relies predominantly on simulated "fake" backends, with real hardware validation limited to constrained experiments on IBM and IQM devices due to cost barriers. The Average Relative Entropy $D_{avg}(\rho||\sigma) = \frac{1}{n+m}\sum_{i,j} D(U\rho_i U^\dagger||U\sigma_j U^\dagger)$ requires $\mathcal{O}(nm)$ pairwise quantum state tomography or simulation costs, becoming prohibitive for large test sets. Furthermore, while claiming evaluation "without explicit executions on quantum devices," the method requires prior knowledge of the specific device's transpilation depth and gate set, which may not be available a priori. The threshold values (e.g., log-DTSAE < 3.5) appear empirically fitted to simulated noise models rather than theoretically derived, raising questions about their validity under real hardware temporal drift.

“Most runs were performed on simulated-fake quantum backends, due to the high costs associated with running experiments on real quantum devices”

This paper · Section V

“$D_{avg} (\rho||\sigma) = 1/n + m \Sigma_{n,m} i,j=0D(U \rho_iU ^\dagger||U \sigma_j U ^\dagger)$”

This paper · Section III-B, Eq. 2

Evidence and comparison

The evidence robustly supports the claim that combining entropy with depth improves prediction over either factor individually, as shown in Table I where models with similar depths diverge significantly based on entropy values. The comparison to prior work is generally fair, particularly challenging Sharma et al.'s claims of depth-independent resilience with the observation that "circuit depth alone is insufficient to characterize shallow circuits." However, the contradiction of Ahmed et al.'s probability distribution hypothesis lacks quantitative statistical analysis. The diversity of experimental controls—20+ ansatzes, multiple optimizers (Adam, AdaGrad, etc.), and five distinct dataset generators—strengthens generalizability claims, though the >85% accuracy threshold for model selection may introduce survivorship bias toward well-behaved circuits.

“From Table I, it is clear that there is no clear correlation between accuracy and either depth or average relative entropy by themselves as well”

This paper · Section V

“circuit depth alone is insufficient to characterize shallow circuits”

This paper · Section I

Reproducibility

The authors provide GitHub repositories for training, testing, and analysis code with specific commit references from March 2026. Experimental components are well-documented, including standard dataset generators (Make Classification, Hidden Manifold, Hyperplanes, Two Curves) and explicit ansatz definitions (QNN pooling, Data Re-upload). However, specific training hyperparameters (learning rates, batch sizes, convergence thresholds) are described only generally as "varying" rather than tabulated exhaustively. The transpilation depth calculations depend on Qiskit's transpiler version and optimization settings (not explicitly specified), which significantly affect quantum circuit depth. While the code appears complete, the computational requirements (8,500+ models trained on the Puhti Supercomputer) present a substantial barrier to independent verification. Limited validation on actual quantum devices (Fez, Torino, Marrakesh, Emerald) versus extensive simulations leaves open questions about threshold validity under real temporal noise fluctuations.

“VQC-QNN Training and Testing Code and Data [38]: https://github.com/AakashShindeHelsinki/VQC”

This paper · Section X

“Puhti Supercomputer was used for training these QML models”

This paper · Section IV

Abstract

Variational Quantum Algorithms (VQAs) have been extensively researched for applications in Quantum Machine Learning (QML), Optimization, and Molecular simulations. Although designed for Noisy Intermediate-Scale Quantum (NISQ) devices, VQAs are predominantly evaluated classically due to uncertain results on noisy devices and limited resource availability. Raising concern over the reproducibility of simulated VQAs on noisy hardware. While prior studies indicate that VQAs may exhibit noise resilience in specific parameterized shallow quantum circuits, there are no definitive measures to establish what defines a shallow circuit or the optimal circuit depth for VQAs on a noisy platform. These challenges extend naturally to Variational Quantum Classification (VQC) algorithms, a subclass of VQAs for supervised learning. In this article, we propose a relative entropy-based metric to verify whether a VQC model would perform similarly on a noisy device as it does on simulations. We establish a strong correlation between the average relative entropy difference in classes, transpilation circuit depth, and their performance difference on a noisy quantum device. Our results further indicate that circuit depth alone is insufficient to characterize shallow circuits. We present empirical evidence to support these assertions across a diverse array of techniques for implementing VQC, datasets, and multiple noisy quantum devices.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.