Nothing here yet
As users increasingly consult multiple large language models for decision support, a critical question arises: does increasing the number of AI advisors improve accuracy or amplify harmful conformity pressures? This paper investigates how panel size, within-panel consensus, and human-likeness of presentation shape human reliance and decision accuracy across three prediction tasks (income, recidivism, and dating). Through two crowdsourced experiments with 348 participants, the authors reveal a surprising non-monotonic relationship: three AI advisors improve accuracy over a single advisor, but five provide no additional benefit, while unanimous consensus fosters overreliance and wide disagreement creates confusion.
The paper challenges the rapid shift toward Vision Transformer-based continual learning by demonstrating that lightweight, pruned Convolutional Networks can outperform existing foundation model approaches. The authors propose Pruned Adaptation Modules (PAM), which freeze early ResNet layers and introduce sparsely structured task-specific modules, yielding significant parameter reductions while improving accuracy. This work fills a critical methodological gap by establishing a strong, efficient baseline that questions whether recent advances reflect genuine progress or merely the absence of rigorous ConvNet comparisons.
This paper benchmarks classical statistical models (LR, SARIMAX), deep learning approaches (MLP, LSTM), and physics-guided variants for multi-horizon AQI forecasting in Dallas County, North Texas. The core innovation is incorporating EPA breakpoint-based AQI formulations as consistency constraints via weighted loss functions ($\mathcal{L}_{total} = \lambda_{data}\mathcal{L}_{data} + \lambda_{phys}\mathcal{L}_{phys}$). The work addresses a practical need for standardized regional model comparison to guide public health decision-making.
This paper compares classical machine learning methods (Linear Regression, SVM, Logistic Regression) for predicting vehicle fuel consumption using the 1974 Motor Trend dataset (N=398). The author argues that these "interpretable" models outperform "black box" deep learning approaches for static physical datasets—a claim that relies on a false equivalence between 50-year-old tabular data and modern time-series telematics applications.
This exploratory study investigates using TabPFN—a transformer-based tabular foundation model—and its extension library for geotechnical site characterization. The core idea is to leverage in-context learning to perform soil classification and multivariate parameter imputation without model retraining or hyperparameter tuning, while obtaining interpretable insights through embeddings, posterior distributions, and SHAP analysis. This matters because geotechnical engineering requires uncertainty-aware, interpretable predictions for safety-critical decisions, yet faces severe data scarcity.
The paper proposes CEBaG, a deterministic hallucination detection method for medical Visual Question Answering that eliminates the need for costly stochastic sampling. By combining token-level predictive variance with visual evidence magnitude derived from log-probabilities, the method detects when models generate responses that contradict input images. This approach achieves superior detection accuracy while reducing computational cost from 20+ generations to just three forward passes, addressing a critical safety bottleneck in clinical AI deployment.
Dyadic is a web-based platform for studying human-human and human-AI conversations through text or voice-based interaction. It attempts to solve the methodological gap in conversation research by providing turnkey tools for experimental manipulation, live monitoring, and in-situ survey delivery during ongoing chats. The core value proposition is lowering barriers to entry for researchers studying dyadic interaction processes without requiring programming expertise.
This paper addresses BLE-based indoor localization in care facilities by shifting from independent-window classification to sequential learning. The proposed DASEL framework combines frequency-based feature engineering, bidirectional GRUs with attention mechanisms, and a two-level hierarchical ensemble to model temporal movement trajectories. Achieving a 53.1% improvement over traditional baselines on the ABC 2026 challenge dataset, the work demonstrates that capturing temporal dependencies is critical for accurate indoor localization in complex real-world environments.
Zeroth-order (ZO) optimization enables memory-efficient training via forward-only gradient estimation, but its stochastic nature obscures training dynamics compared to well-characterized first-order (FO) methods. This paper introduces the Neural Zeroth-order Kernel (NZK) to describe model evolution in function space under ZO updates, proving that the expected NZK remains time-invariant for linear models and depends explicitly on the moments of random perturbation directions. The work extends to linearized neural networks and proposes using a single shared random vector to accelerate convergence, with experiments on synthetic and real-world datasets (MNIST, CIFAR-10, Tiny ImageNet) validating the theoretical predictions.
Cross-Layer Transcoders (CLTs) compress the attribution graphs used in mechanistic interpretability by sharing features across transformer layers, but their quadratic parameter scaling ($N_{\text{CLT}} \propto L^2$) makes training and analysis prohibitively expensive for most researchers. This paper introduces CLT-Forge, an open-source library that combines feature-sharded distributed training, compressed activation caching (int8/int4/int2 with zstd), automated interpretability pipelines, and integration with Circuit-Tracer to provide the first unified workflow for end-to-end CLT analysis at scale.
This paper investigates a fundamental paradox in hybrid sequence models: content-based routing requires exactly the pairwise computation it aims to avoid. Through 20+ controlled experiments, the authors demonstrate that one layer of softmax attention creates a latent $\sim$34-dimensional subspace via value aggregation, enabling 98.4% routing precision, while all alternatives (recurrence, linear attention, contrastive pretraining) cluster at 1–29%. These findings reframe attention as a representation constructor rather than merely a computation mechanism, providing a mechanistic explanation for why sub-quadratic models fail at associative recall.
SegMaFormer proposes a hybrid encoder for 3D medical image segmentation that places Mamba state-space layers in early high-resolution stages (for linear-complexity sequence mixing) and self-attention only in deeper low-resolution stages (where quadratic cost is manageable). The goal is to reduce the prohibitive compute of full 3D attention while preserving global context. With just 2M parameters and 15 GFLOPs, the authors claim competitive results on BraTS, Synapse, and ACDC benchmarks against models up to 75\times larger.
This paper addresses video moment retrieval (VMR) for complex multi-verb queries by proposing a two-stage framework that generates auxiliary short videos via text-to-video diffusion (CogVideoX) as temporal motion priors, then processes them through a linear-time Mamba network. The approach tackles the limitation of static image augmentations—which miss motion dynamics—while avoiding the quadratic complexity of Transformer-based methods on long untrimmed videos. The framework achieves state-of-the-art results on TVR with particular strength on multi-verb queries, though its effectiveness depends heavily on external video generation quality.
This paper identifies a subtle but important distinction between two interpretations of the TD error in reinforcement learning: the explicit form (bootstrapped target minus prediction) commonly used in deep RL, and the implicit form (difference between temporally successive predictions) from the original Sutton (1988) formulation. While equivalent in tabular settings, the authors demonstrate that increasingly nonlinear architectures cause these to diverge significantly, with profound implications for average-reward and differential RL algorithms.
This paper investigates why humans persist with failing strategies despite negative feedback, proposing 'confidence freeze'—a metastable state where early success decouples metacognitive confidence from behavior. Using a multi-reversal bandit task (N=332 across 3 experiments), the authors show that brief exposure to 90% success rates (vs. 60%) induces lock-in behavior where participants endure ~6 consecutive losses while reporting plummeting confidence, suggesting a dynamic mechanism rather than stable individual traits.
SPECTRE-G2 tackles epistemic uncertainty in safety-critical systems by detecting 'unknown unknowns'—inputs that violate the structural assumptions of the training distribution. Unlike prior work that relies on single signals (confidence, density, or reconstruction error), this paper proposes a multi-expert architecture combining eight complementary signals from a dual-backbone network. The core idea is that diverse structural anomalies require diverse detection mechanisms. The method achieves strong empirical results across synthetic causal, tabular, image, and RL environments, though some baseline implementations appear problematic.
This paper investigates whether LLMs exhibit genuine moral reasoning or merely produce convincing moral rhetoric through a large-scale empirical study of 13 models across 6 classical moral dilemmas. Using Kohlberg's stages of moral development as a diagnostic framework, the authors evaluate whether model outputs track human developmental patterns or reflect alignment training artifacts. The core finding is "moral ventriloquism" — the hypothesis that models acquire post-conventional moral language through RLHF without the underlying cognitive architecture, evidenced by distributional inversions (86% Stages 5-6 vs. human Stage 4 dominance), near-robotic cross-dilemma consistency (ICC > 0.90), and "moral decoupling" where stated justifications misalign with action choices.
Learning from Label Proportions (LLP) trains instance-level classifiers using only bag-level class proportions, addressing privacy constraints and annotation costs. This paper introduces LLP-DC, which enforces dual constraints: bag-level mean predictions align with given proportions, while instance-level training uses hard pseudo-labels generated via minimum-cost maximum-flow to strictly satisfy proportion constraints. The method offers a novel formulation of LLP as a candidate label assignment problem, achieving state-of-the-art results across standard vision benchmarks.
Recent theoretical models of diffusion as coupled Ornstein-Uhlenbeck processes predict a hierarchy of interaction timescales creating a synchronization gap between global and local committing modes. This work investigates how this gap mechanistically emerges within pretrained Diffusion Transformers by introducing a controlled architectural realization of replica coupling via symmetric cross-attention gates with strength $g$. Through linearized analysis and empirical probing of DiT-XL/2 across all 28 layers, the authors demonstrate that the gap is an intrinsic, depth-localized property that collapses under strong coupling as $\mathcal{O}(\frac{1-g}{1+g})$, providing a bridge between continuous statistical physics and discrete transformer dynamics.
Multi-objective optimization of expensive biophysical neural simulations is hindered by high-dimensional parameter spaces and binary constraints that partition the search space without gradient signals. This paper introduces dmosopt, a framework that jointly learns objectives, constraints, and parameter sensitivities in a single differentiable surrogate model $f: \mathbb{R}^n \rightarrow \mathbb{R}^{q+k}$. By computing a unified gradient $\mathbf{g}_{\text{sopt}}$ that simultaneously steers toward improved objective values and greater constraint satisfaction, the method navigates feasibility manifolds that defeat standard approaches, achieving substantial speedups on problems ranging from single-cell models to million-neuron networks.