Your paper timeline
Scroll AI takes the way you would scroll a great paper aggregator: quick signal first, deeper critique when something earns your attention, and challenges when a claim feels off.
3 papers in quant-ph
Trending mixes fresh papers with community signal.
0
cs.LGquant-ph Oscar Novo, Oscar Bastidas-Jossa, Alberto Calvo et al. · Mar 23, 2026

This paper investigates whether domain knowledge for quantum code generation should be embedded in model parameters through fine-tuning or provided at inference time via retrieval and agents. Comparing a parameter-specialized Granite-20B baseline against modern general-purpose LLMs (OpenAI, Claude, Gemini) on the Qiskit-HumanEval benchmark, the authors find that inference-time augmentation—particularly agentic execution feedback—outperforms fine-tuning by over 35 percentage points, offering a more maintainable path as quantum SDKs evolve.

Recent advances in large language models (LLMs) have enabled the automation of an increasing number of programming tasks, including code generation for scientific and engineering domains. In rapidly evolving software ecosystems such as quantum software development, where frameworks expose complex abstractions, a central question is how best to incorporate domain knowledge into LLM-based assistants while preserving maintainability as libraries evolve. In this work, we study specialization strategies for Qiskit code generation using the Qiskit-HumanEval benchmark. We compare a parameter-specialized fine-tuned baseline introduced in prior work against a range of recent general-purpose LLMs enhanced with retrieval-augmented generation (RAG) and agent-based inference with execution feedback. Our results show that modern general-purpose LLMs consistently outperform the parameter-specialized baseline. While the fine-tuned model achieves approximately 47% pass@1 on Qiskit-HumanEval, recent general-purpose models reach 60-65% under zero-shot and retrieval-augmented settings, and up to 85% for the strongest evaluated model when combined with iterative execution-feedback agents -representing an improvement of more than 20% over zero-shot general-purpose performance and more than 35% over the parameter-specialized baseline. Agentic execution feedback yields the most consistent improvements, albeit at increased runtime cost, while RAG provides modest and model-dependent gains. These findings indicate that performance gains can be achieved without domain-specific fine-tuning, instead relying on inference-time augmentation, thereby enabling a more flexible and maintainable approach to LLM-assisted quantum software development.
0
quant-phcs.LG Harsh Wadhwa, Rahul Bhowmick, Naipunnya Raj et al. · Mar 23, 2026

Quantum machine learning model selection currently lacks principled guidelines, forcing practitioners to train numerous expensive configurations. This paper introduces QBET (Quantum Bias-Expressivity Toolbox), an unsupervised pre-screening framework that evaluates hybrid quantum-classical transformers using LZ-complexity-based Simplicity Bias (AUC) and Expressivity metrics without gradient descent. The core idea is that architectures with higher AUC (stronger bias toward simple Boolean functions) correlate with better downstream task performance, offering a filter to identify promising quantum attention variants before committing to full training on NISQ devices.

Quantum machine learning models generally lack principled design guidelines, often requiring full resource-intensive training across numerous choices of encodings, quantum circuit designs and initialization strategies to find effective configuration. To address this challenge, we develope the Quantum Bias-Expressivity Toolbox ($\texttt{QBET}$), a framework for evaluating quantum, classical, and hybrid transformer architectures. In this toolbox, we introduce lean metrics for Simplicity Bias ($\texttt{SB}$) and Expressivity ($\texttt{EXP}$), for comparing across various models, and extend the analysis of $\texttt{SB}$ to generative and multiclass-classification tasks. We show that $\texttt{QBET}$ enables efficient pre-screening of promising model variants obviating the need to execute complete training pipelines. In evaluations on transformer-based classification and generative tasks we employ a total of $18$ qubits for embeddings ($6$ qubits each for query, key, and value). We identify scenarios in which quantum self-attention variants surpass their classical counterparts by ranking the respective models according to the $\texttt{SB}$ metric and comparing their relative performance.
0
quant-phcs.LG Aakash Ravindra Shinde, Arianne Meijer - van de Griend, Jukka K. Nurminen · Mar 22, 2026

Variational Quantum Classifiers (VQAs) are typically trained in ideal classical simulations, raising concerns about reproducibility on noisy quantum hardware. This paper proposes that the average relative entropy between class distributions combined with transpilation depth predicts noise robustness—introducing the log-DTSAE metric to forecast accuracy degradation without requiring noisy hardware execution. The authors validate this across thousands of models spanning diverse ansatzes, encodings, and simulated backends from IBM, IQM, and IonQ.

Variational Quantum Algorithms (VQAs) have been extensively researched for applications in Quantum Machine Learning (QML), Optimization, and Molecular simulations. Although designed for Noisy Intermediate-Scale Quantum (NISQ) devices, VQAs are predominantly evaluated classically due to uncertain results on noisy devices and limited resource availability. Raising concern over the reproducibility of simulated VQAs on noisy hardware. While prior studies indicate that VQAs may exhibit noise resilience in specific parameterized shallow quantum circuits, there are no definitive measures to establish what defines a shallow circuit or the optimal circuit depth for VQAs on a noisy platform. These challenges extend naturally to Variational Quantum Classification (VQC) algorithms, a subclass of VQAs for supervised learning. In this article, we propose a relative entropy-based metric to verify whether a VQC model would perform similarly on a noisy device as it does on simulations. We establish a strong correlation between the average relative entropy difference in classes, transpilation circuit depth, and their performance difference on a noisy quantum device. Our results further indicate that circuit depth alone is insufficient to characterize shallow circuits. We present empirical evidence to support these assertions across a diverse array of techniques for implementing VQC, datasets, and multiple noisy quantum devices.