Nothing here yet
Preference alignment typically requires expensive weight-updating training like RLHF or DPO, which lacks mechanistic interpretability. This paper proposes DSPA, an inference-time method that dynamically steers sparse autoencoder (SAE) features based on prompt content without modifying base-model weights. By computing a sparse conditional-difference map $\mathbf{A}$ from preference triples that links prompt features to generation-control features, DSPA edits only token-active latents during decoding. The method achieves competitive open-ended generation quality with up to $4.47\times$ fewer alignment-stage FLOPs than training-based alternatives, while offering direct auditability of which features are modified and revealing that preference directions are dominated by discourse and stylistic signals.
Collaborative multi-agent LLM systems struggle with credit assignment during RL training: shared terminal rewards obscure individual contributions, encouraging free-riding and high gradient variance. This paper introduces CCPO (Counterfactual Credit Policy Optimization), which estimates each agent's marginal contribution by contrasting actual team performance with counterfactual outcomes—simulating performance without that specific agent. The method targets efficient discrete generation for LLMs and demonstrates improved reasoning accuracy across mathematical and logical benchmarks.
Machine learning models on mobile devices spend 61-86% of execution time extracting features from user behavior logs rather than running inference. This paper introduces AutoFeature, a graph-based engine that eliminates redundant operations across features and consecutive executions using directed acyclic graph optimization and intelligent caching. Tested across five industrial services including TikTok and e-commerce platforms, it achieves 1.33×-4.53× end-to-end latency reduction without accuracy loss.
This paper introduces a scalable framework to measure institutional variation in solid-organ transplant patient education materials using retrieval-augmented generation (RAG). The authors ground 1,115 patient questions across 102 handbooks from 23 U.S. centers, then classify answer pairs into a five-label taxonomy (Absent, Consistent, Complementary, Divergent, Contradictory). The work exposes critical information gaps: 96.2% of question-handbook pairs miss relevant content, and 20.8% of non-absent pairs show clinically meaningful divergence, with reproductive health nearly absent (95.1%) across all materials.
Recursive self-improvement promises sustained capability growth but faces recursive drift—the compounding of errors when models train on self-generated outputs. This paper proposes Neuro-Symbolic Recursive Self-Alignment (NSRSA), which stabilizes iterative self-training by filtering training data through symbolic verification at the reasoning step level. The core claim is that eliminating lucky guesses (correct answers with flawed reasoning) prevents recursive collapse and enables sustained improvement over multiple iterations.
Discrete diffusion models have been limited to simplistic noising schemes like uniform corruption or masking, restricting their ability to leverage semantic structure in large vocabularies. This paper introduces GDDS (Generalized Discrete Diffusion from Snapshots), a framework supporting arbitrary continuous-time Markov chain noising processes via exact uniformization-based sampling and a tractable snapshot-level ELBO. The work achieves state-of-the-art results on large-scale language modeling tasks, claiming to surpass autoregressive baselines for the first time at this scale.
The paper addresses inefficiency in task-oriented dialogue systems that recompute answers via retrieval or generation each turn, even when answers are already derivable from prior state. It proposes framing safety certification as a computational primitive where the fixed-point closure $cl(A_t)$ contains all derivable capabilities, enabling a Certified Answer Store with Pre-Answer Blocks that eliminates redundant RAG calls through formal containment checks. This matters because it reduces mean RAG calls from 13.7 to 1.31 and latency from 18.8s to 340ms while eliminating unsafe cache hits that plague embedding-based approaches.
RoboAlign addresses the modality gap between high-level language reasoning and low-level robot control in Vision-Language-Action (VLA) models. The framework first uses supervised fine-tuning to teach a multimodal LLM to generate FAST action tokens through zero-shot chain-of-thought reasoning, then applies Group Relative Policy Optimization (GRPO) to refine reasoning based on token-level action accuracy. This matters because prior work showed that improving embodied reasoning via language supervision often fails to translate into better robot performance or even degrades it.
This paper develops a differential-geometric framework for shallow neural networks that treats predictor classes rather than raw parameters as the fundamental objects. By quotienting out permutation and scaling symmetries on a regular set $\Theta_{\mathrm{reg}}$, the authors define a function-induced metric $g_\theta$ and an effective Hessian that removes spurious curvature degeneracies along symmetry orbits. The work connects implicit bias to quotient-level geometry, with concrete analysis for quadratic-activation models where parameters map explicitly to symmetric matrices $Q(\theta)=\sum_{i=1}^m a_i w_i w_i^\top$.
KG-Hopper addresses Knowledge Base Question Answering (KBQA) by training compact 7B LLMs to perform multi-hop reasoning over Knowledge Graphs in a single inference round. Unlike sequential multi-step approaches that suffer from error cascades, it embeds the entire KG traversal process into a unified "thinking" stage using reinforcement learning. The core innovation is using GRPO (Group Relative Policy Optimization) with composite rewards to teach models to autonomously invoke retrieval tools via special tokens and reason across multiple hops without predefined pipelines.
World models for reinforcement learning learn to simulate environment dynamics, yet what they represent internally remains unclear. This paper probes two architecturally distinct models—IRIS (a discrete token transformer) and DIAMOND (a continuous diffusion UNet)—on Atari Breakout and Pong using linear and MLP probes, causal interventions, and attention analysis to test whether they develop structured, interpretable representations of game state. The core finding is that world models develop approximately linear representations of salient state variables (ball position, score) that are not merely correlated but functionally used during prediction.
This paper tackles the persistent bottleneck of Multidisciplinary Software Development (MSD), where domain experts and software developers must manually coordinate across heterogeneous artifacts and incompatible formalisms. The authors model MSD workflows as a directed dependency graph $\mathcal{G}=(\mathcal{V},\mathcal{R})$ and propose an iterative optimization framework that replaces manual translation nodes with LLM-powered services. This matters because their approach reduces per-API development time from approximately 5 hours to under 7 minutes while maintaining production-quality code, demonstrating that workflow-level automation—not just coding assistance—can unlock substantial efficiency gains in industrial settings.
The paper addresses state-space explosion in partially observable environments by formalizing a bounded-interaction analogue of the Myhill-Nerode theorem for finite POMDPs. Its core insight is that two observation histories are equivalent if no bounded finite-state controller can distinguish them via closed-loop interaction, inducing a canonical quotient that is minimal and unique for that observer capacity. This yields a principled separation between exact decision sufficiency for observation-measurable objectives and approximate bounds for latent-state rewards, with the canonical object strictly requiring clock-aware probe families.
ARYA presents a world model architecture using "nano models"—small specialized components orchestrated by an autonomous agent (AARA)—rather than monolithic neural networks. The system claims physics-constrained determinism, sub-20-second training cycles, and an "unfireable" safety kernel that cannot be bypassed. The authors position this as production-deployed across seven industry domains from aerospace to pharma, achieving state-of-the-art results on six of nine benchmarks with "zero neural network parameters."
Unmanned aerial vehicle (UAV) path planning traditionally treats efficiency and safety objectives as a single multiobjective optimization problem. This paper proposes a biparty multiobjective formulation with separate decision-makers for efficiency and safety, adapting immune algorithms (NNIA, HEIA, AIMA) into BPNNIA, BPHEIA, and BPAIMA to find common Pareto optimal solutions. The work addresses the practical scenario where regulatory and operational departments have independent criteria.
Precision free-space optics demands sub-millimeter and sub-degree tolerances where traditional robotic pick-and-place fails. This work introduces a closed-loop robotics framework integrating hierarchical computer vision, Newton-based spatial optimization, and Bayesian angular optimization to autonomously construct, align, and maintain optical systems. The authors demonstrate this by building a tabletop laser cavity from randomly distributed components—achieving beam alignment, mode selection, and self-recovery without human intervention. The system bridges the gap between coarse robotic manipulation and the extreme precision required for functional optical experiments.
The paper tackles the challenge of controlling high-level behavioral traits in LLM agents deployed in strategic settings. Rather than treating models as black boxes via prompting, the authors construct 'persona vectors'—linear directions in activation space—for traits like altruism and forgiveness using contrastive activation addition. Applied to six canonical games, these vectors allow both measurement of behavioral tendencies and causal steering of decisions, offering a mechanistic handle on strategic behavior.
Organizations deploying commercial AI systems inherit vendor-imposed value constraints that limit which recommendations the system can produce. This paper formalizes these boundaries as a "behavioural feasible set" and demonstrates through controlled experiments that alignment training compresses this set, making AI systems structurally unable to endorse certain legitimate organizational actions even under strong contextual pressure. The work reframes AI governance from a capability question to a constraint diagnosis problem, showing that vendor selection partially determines which trade-offs remain negotiable for adopting firms.
Modern failure management pipelines tightly couple task-specific models with modality-specific encoders, blocking reuse across systems. RuntimeSlicer proposes a unified runtime state representation that encodes metrics, traces, and logs into a single embedding via Unified Runtime Contrastive Learning, then adapts to downstream tasks through State-Aware Task-Oriented Tuning. The core value is decoupling representation learning from failure management tasks—if it generalizes, teams could freeze the embedding backbone and ship lightweight task heads.
PivotRL addresses the compute-generalization trade-off in agentic post-training by extracting "pivot" states—intermediate turns with high outcome variance—from existing SFT trajectories and applying functional-equivalence rewards rather than strict string matching. The method achieves comparable accuracy to end-to-end RL on SWE-Bench with roughly one-quarter the rollout cost, while avoiding the catastrophic forgetting typical of supervised fine-tuning on long-horizon tool-use tasks.