Nothing here yet
Quantum machine learning model selection currently lacks principled guidelines, forcing practitioners to train numerous expensive configurations. This paper introduces QBET (Quantum Bias-Expressivity Toolbox), an unsupervised pre-screening framework that evaluates hybrid quantum-classical transformers using LZ-complexity-based Simplicity Bias (AUC) and Expressivity metrics without gradient descent. The core idea is that architectures with higher AUC (stronger bias toward simple Boolean functions) correlate with better downstream task performance, offering a filter to identify promising quantum attention variants before committing to full training on NISQ devices.
AdditiveLLM2 is a domain-adapted multi-modal LLM for additive manufacturing built by fine-tuning Gemma 3 12B on ~50 million tokens from open-access AM journal articles. The work addresses the challenge of specializing general LLMs for technical domains without consuming context window space (as with RAG) or requiring massive datasets. Using domain adaptive pretraining (DAPT) for both text and vision plus visual instruction tuning (VIT), the authors demonstrate that even relatively small curated datasets can yield domain expertise exceeding 90% accuracy on AM knowledge tasks.
While most bias mitigation research targets binary classification, multi-class fairness remains under-explored. This paper proposes Generalised Exponentiated Gradient (GEG), an in-processing method that extends the Exponentiated Gradient framework to multi-class settings and enables simultaneous optimization of multiple fairness constraints via positive-label moment conditions. Evaluated on ten datasets against six baselines, GEG achieves fairness improvements up to 92% with moderate accuracy trade-offs, filling a critical gap in fair machine learning toolboxes.
RAMPAGE addresses discretization bias in Extragradient (EG) methods for variational inequalities by replacing the deterministic midpoint with randomized sampling. The core idea uses uniform sampling to construct an unbiased estimator of the continuous-time flow integral, while RAMPAGE+ leverages antithetic variates to eliminate first-order variance terms. This matters for training GANs and other non-conservative games where EG's $\mathcal{O}(\eta^2)$ bias causes divergence in highly nonlinear regimes.
This paper addresses the high computational cost of deploying Large Language Models (LLMs) in resource-constrained environments by introducing the Performance-Efficiency Ratio (PER), a novel metric that integrates accuracy, throughput, memory, and latency via geometric mean normalization. The authors evaluate 16 open-source language models ranging from 0.5B to 72B parameters across five NLP tasks (IMDB, HellaSwag, ARC-Easy, SQuAD 2.0, and GSM8K), concluding that small models (0.5–3B parameters) consistently achieve superior PER scores compared to their larger counterparts.
Virtual cell modeling aims to simulate cellular responses to drug perturbations in silico, but existing flow-matching models optimize only pixel-level reconstruction and can produce biologically implausible outputs like nuclei outside cytoplasm. CellFluxRL addresses this by post-training the state-of-the-art CellFlux model with reinforcement learning, using seven manually designed reward functions spanning biological function (mode of action), structural validity (nuclear containment), and morphological statistics (size/count). The approach reveals a systematic framework for enforcing physical constraints through differentiable optimization, achieving consistent improvements across all biological metrics while maintaining image quality.
This paper investigates how vision-language models (VLMs) perform spatial reasoning—the binding of objects to spatial relations. It reveals that VLMs rely on two concurrent mechanisms: a dominant one where the vision encoder encodes object layout globally across visual tokens (extending into background regions), and a secondary one where the language model backbone forms ordering representations over object tokens. The finding that enhancing these vision-derived spatial representations improves performance without fine-tuning challenges the prevailing focus on LM backbones and highlights the critical role of vision encoders in multimodal reasoning.
This paper proposes a bold interdisciplinary bridge between holographic string dualities and artificial intelligence, hypothesizing that AI tasks such as language modeling can be viewed as particle trajectory prediction on graphs admitting a holographically dual "string" description. Drawing on the AdS/CFT correspondence, the authors conjecture that word metrics on $S_n$ Cayley graphs correspond to areas under lattice paths in dual planar polygons, verified computationally via their CayleyPy library.
The paper tackles partition-constrained subset selection for 'close-to-submodular' objectives—specifically α-weakly DR-submodular and (γ,β)-weakly submodular functions—where existing distorted local-search methods suffer from prohibitive query complexity (˜O(1/ϵ^6)) and require prior knowledge of structural parameters. The authors propose the Multinoulli Extension (ME), a continuous relaxation that learns multinoulli priors for each partition block, enabling lossless rounding without submodularity assumptions. They develop offline (Multinoulli-SCG) and online (Multinoulli-OSCG/OSGA) algorithms achieving tight approximation guarantees with O(1/ϵ^2) query complexity and O(√T) regret, respectively.
FluidWorld tackles the quadratic cost and lack of spatial inductive bias in Transformer-based world models by replacing self-attention with reaction-diffusion PDEs. The core innovation is using PDE integration itself—governed by a discretized Laplacian and learned reaction terms—as the predictive engine, rather than as a physical simulator. This proof-of-concept demonstrates that at $\sim$800K parameters, such physics-inspired dynamics match or exceed attention and convolutional recurrence on spatial coherence metrics while offering $O(N)$ complexity, though at slower training speeds.
HamVision proposes using damped harmonic oscillator dynamics as a structured inductive bias for medical image analysis. The core idea is that phase-space decomposition yields three representations—position $q$ (features), momentum $p$ (gradients), and energy $H = rac{1}{2}|z|^2$ (saliency)—that serve both segmentation and classification tasks without modifying the shared bottleneck. This physics-constrained approach aims to replace generic learned transformations with interpretable, dynamics-based feature extraction across diverse medical imaging modalities.
This paper tackles the memory explosion problem in high-rank DoRA fine-tuning. At $d_{in}=8192$ and rank $r=384$, computing the row-wise norm $\|\mathbf{W}+s\mathbf{B}\mathbf{A}\|_{\text{row}}$ via standard materialization consumes ~512 MB per module—prohibitive for large models with hundreds of adapted layers. The authors propose a factored norm decomposition that reduces the computation to $\mathcal{O}(d_{out}r+r^2)$ intermediates plus fused Triton kernels that collapse the composition into a single pass. On 8–32B vision-language models, this yields 1.5–2.0× speedups and up to 77 GB VRAM savings without numerical drift.
This paper addresses uncertainty quantification (UQ) for distribution-to-distribution flow matching, a setting where models map between well-defined source and target distributions (e.g., unperturbed to drug-treated cell images) rather than noise-to-data. The authors propose Bayesian Stochastic Flow Matching (BSFM), which combines Stochastic Flow Matching (SFM) for capturing aleatoric uncertainty via learnable diffusion terms, with MCD-Antithetic—a scalable Bayesian method using Monte Carlo Dropout and antithetic sampling—to decompose total uncertainty into aleatoric and epistemic components for reliable out-of-distribution (OOD) detection in scientific imaging.
This paper tackles the Long-to-Short (L2S) model merging problem: combining a base LLM with a long-chain-of-thought reasoning model to preserve accuracy while drastically reducing output length. The core contribution is a theoretical framework proving that merging error is bounded by the per-layer Hessian norm (Proposition 1), which motivates using the diagonal Fisher Information Matrix (FIM) as a data-free proxy for assigning layer-adaptive merging coefficients. The resulting FIM-TIES method achieves state-of-the-art results on 5 of 6 benchmarks without requiring any domain-specific calibration data.
Hard-exploration problems in RL—such as Montezuma’s Revenge and sparse-reward robotic control—require finding rare trajectories where standard RL fails. This paper argues that using policy optimization to maximize intrinsic rewards is unnecessarily inefficient for mere state coverage. Instead, it proposes Go-With-Uncertainty (GowU), a tree-search method that decouples exploration from exploitation: it uses epistemic uncertainty to drive a Go-With-The-Winner particle population search, then distills discovered trajectories via supervised backward learning. The approach achieves state-of-the-art scores on hard Atari benchmarks with an order of magnitude fewer environment interactions than intrinsic-motivation baselines, and solves high-dimensional continuous-control tasks (Adroit, AntMaze) from pixels without demonstrations.
Paper introduces PnPMass, a plug-and-play framework for weak lensing mass mapping that reconciles reconstruction accuracy with practical deployment constraints of upcoming Stage-IV surveys. The key innovation is a carefully chosen data-fidelity operator that decouples denoiser training from observation-specific noise statistics, enabling a single trained model to handle varying survey conditions without retraining. Coupled with moment-network-based uncertainty quantification and conformal calibration, the method offers fast inference with coverage guarantees, addressing limitations of both end-to-end deep learning and costly MCMC sampling approaches.
MindTS tackles multimodal time series anomaly detection by fusing numerical time series with text from two sources: endogenous text (LLM-generated descriptions of patch statistics) and exogenous text (external reports). The core idea is to align these heterogeneous modalities via contrastive learning and filter textual redundancy using an Information Bottleneck-inspired content condenser before cross-modal reconstruction. This matters because real-world anomalies often manifest in contextual text (e.g., policy changes affecting stock prices) that pure numerical models miss.
dynActivation addresses the rigidity of fixed activation functions by introducing per-layer trainable scalars that interpolate between a base nonlinearity and a linear path. The method adds only two parameters per layer ($\alpha_i$ and $\beta_i$) via $f_i(x) = \text{BaseAct}(x)(\alpha_i - \beta_i) + \beta_i x$, allowing adaptive nonlinearity allocation across depth. Results show strong vision benchmarks (+14% on CIFAR-10), robustness to extreme depth scaling (95%+ accuracy on 75-layer MNIST), and faster convergence (24% AUC reduction), though LLM perplexity gains vanish in long-run training.
This paper investigates a fundamental failure mode in learning systems: when feedback reliability is unobservable (latent), standard algorithms can converge stably to systematically incorrect solutions while exhibiting normal optimization behavior (decreasing loss, vanishing gradients). The authors formalize this as a scale-dependent identifiability problem—single-step feedback is insufficient to distinguish reliable from biased experience, yet trajectory-level statistics carry separable signals. They propose the Monitor–Trust–Regulator (MTR) framework, which maintains a slow-timescale trust variable inferred from learning dynamics to modulate updates, enabling recovery from persistent bias.
This paper addresses the challenge of detecting network attacks in IoT environments while preserving data privacy and minimizing communication overhead. The authors propose a federated learning framework using lightweight autoencoders deployed directly on Raspberry Pi edge devices to detect anomalies in real-time through reconstruction error $\mathcal{E}(t)=\|x_{t}-\hat{x}_{t}\|^{2}$. A real-world testbed with ZigBee-enabled sensor nodes was constructed to evaluate the approach against redirection attacks, demonstrating that federated training can match centralized performance while significantly reducing data transmission from 4.5 MB to 378 KB.