Nothing here yet
Directional abliteration removes refusal behavior from language models by projecting refusal-mediating directions out of weight matrices, where these directions are extracted by contrasting harmful against harmless prompt activations. This paper investigates whether topically matching the harmless baseline to harmful prompts — using, for example, defensive cybersecurity prompts to contrast against hacking prompts — yields cleaner refusal directions than the standard practice of using general-purpose harmless prompts. The central finding is that topic-matched contrast completely fails to produce functional refusal directions while unmatched baselines succeed, because matched subtraction cancels the dominant topic component shared between prompts of the same subject, leaving residue too small to perturb the residual stream.
This paper addresses multimodal survival analysis for clinical data, integrating pathology text, tabular covariates, and gene expression using locally deployable LLMs. The core innovation is a teacher-student distillation framework that trains a compact 1.5B parameter causal LLM to jointly produce calibrated survival curves and concise prognosis explanations. This matters because cloud-hosted medical AI raises privacy concerns, yet heavyweight local models are impractical for many institutions.
P^2O tackles a critical bottleneck in Reinforcement Learning with Verifiable Rewards (RLVR): hard samples with near-zero success rates yield vanishing gradients, effectively starving the model of supervision signals. The solution synergizes policy optimization with evolutionary prompt optimization (GEPA), using optimized prompts to discover successful trajectories for hard samples, then distilling these capabilities into model parameters via context distillation to avoid inference-time dependencies. Experiments on mathematical reasoning benchmarks demonstrate significant gains over GRPO baselines, particularly on challenging AIME problems (+12.3% avg.).
Diffusion language models (DLMs) enable parallel token generation, but their efficiency depends critically on the decoding strategy that determines which tokens to unmask and when. This paper investigates confidence-based decoding—specifically an entropy sum strategy that adaptively batches tokens until cumulative prediction uncertainty exceeds a threshold—and proves it achieves $\varepsilon$-accurate sampling in KL divergence with expected iteration complexity $\widetilde{O}(H(X_0)/\varepsilon)$. When the data distribution has low entropy ($H(X_0) \ll L$), this yields sublinear complexity in sequence length, providing the first theoretical foundation for why confidence-based methods accelerate sampling without sacrificing fidelity.
CurvZO tackles the memory wall problem in LLM fine-tuning by proposing a zeroth-order optimization method that tracks curvature signals online from scalar feedback instead of requiring pre-computed statistics. The core idea uses curvature-aware importance sampling to select which parameters to perturb in sparse ZO updates, coupled with an adaptive budget mechanism that adjusts sparsity based on the evolving curvature distribution. This matters because existing sparse ZO methods either rely on costly pre-computed Fisher information or use static/random sparsity patterns that may be suboptimal.
The paper addresses the scalability bottleneck in multi-user semantic communications by proposing JSRE (Joint Source and RIS-assisted channel Encoding), a framework that unifies all users under a single semantic encoder-decoder by embedding channel state information (CSI) into the encoding process. The core innovation leverages RIS phase shifts to create channel orthogonality while using CSI-conditioned semantic features to avoid per-user model training, coupled with a Truncated Deep Reinforcement Learning (T-DRL) algorithm that accelerates convergence via model caching and a surrogate similarity estimator. This matters because existing approaches like DeepMA require linearly growing model storage with user count, rendering them impractical for dense deployments.
This paper extends stochastic approximation (SA) theory to non-Markovian driving noise that is also non-ergodic, establishing that the ergodic decomposition of the original process corresponds to a Doeblin decomposition of an equivalent Markov chain. The core insight is that iterates retain memory of the distant past through the tail $\sigma$-field at $-\infty$, offering a theoretical lens on how learning algorithms might encode long-term dependencies. The author proposes this framework as a paradigm for understanding transformer attention mechanisms and continual learning, where the entire history influences current updates.
FinRL-X tackles the engineering gap between quantitative trading research and live deployment by introducing a weight-centric modular architecture that unifies data ingestion, strategy composition (selection–allocation–timing–risk), backtesting, and broker execution within a single protocol. The core insight is treating portfolio weights $w_t \in \mathbb{R}^n$ as the sole interface contract, enabling composable strategies without recoding execution logic.
This paper studies how batch size and sequence length should scale with the total token budget in stochastic conditional gradient methods for LLM training. Under a $\mu$-Kurdyka-\L ojasiewicz condition, the authors derive a BST (Batch-Sequence-Token) scaling rule $BS \asymp T^{2/3}$ that predicts three distinct regimes: noise-dominated, batch-independent optimal, and iteration-starved. The theory yields actionable guidelines for adaptive batch size scheduling and is validated on NanoGPT models up to 1B parameters.
Generative recommender systems like TIGER excel at semantic retrieval but ignore the economic realities of monetization via sponsored content. This paper proposes GEM-Rec, a unified framework that augments semantic IDs with control tokens (<ORG>, <AD>) to factorize slot allocation from item generation, and introduces Bid-Aware Decoding to inject real-time auction bids into inference. The work bridges the gap between generative recommendation and computational advertising, offering theoretical guarantees like allocative monotonicity while allowing dynamic trade-offs between user relevance and platform revenue.
FISformer proposes replacing the dot-product self-attention in Transformers with a Sugeno-type Fuzzy Inference System (FIS) for time series forecasting. Instead of computing query-key similarities, the model fuzzifies tokens using learnable Gaussian membership functions, applies fuzzy rules, and defuzzifies to produce interaction weights. The paper suggests this approach captures uncertainty and nonlinearity better than standard attention, reporting state-of-the-art results on benchmarks like ETT, ECL, and Weather.
Vector Diffusion Maps (VDM) capture pairwise connection relationships in complex datasets via the Graph Connection Laplacian, but eigenvalue decomposition costs $O(n^{2.81})$, prohibiting large-scale applications. This paper proposes LA-VDM (Landmark Accelerated VDM), which constrains diffusion through landmark points and introduces a novel two-stage normalization scheme with parameters $\alpha$ and $\beta$ to handle non-uniform sampling densities in both data and landmarks. Under a manifold model with the frame bundle structure, the authors prove that LA-VDM asymptotically converges to the connection Laplacian while reducing complexity to $O(nm^2)$, enabling applications to datasets with millions of points.
This paper proposes a multi-UAV architecture for autonomous precision agriculture that combines centralized mission planning with decentralized execution control. It integrates coverage path planning, battery-aware task allocation, CNN-based image processing, and battery swapping stations to enable end-to-end farm monitoring. The work targets large-scale agricultural operations with minimal human intervention, claiming advantages in fault-tolerance, scalability, and user-friendliness.
This paper tackles camera-agnostic pruning of 3D Gaussian splats for standardized interchange settings like MPEG I-3DGS, where training images, camera parameters, and gradients are unavailable. The authors propose BetaDescPrune, a one-shot post-training method that computes Hybrid Splat Feature Histogram (HSFH) descriptors to capture local geometric and appearance consistency, then models pruning decisions via Beta-distributed evidence with uncertainty-aware confidence scoring. The core insight is that reliable splat importance can be inferred from intrinsic neighborhood structure alone without rendering supervision.
ALMAB-DC unifies Gaussian process active learning, multi-armed bandit scheduling, and asynchronous distributed computing to tackle expensive black-box optimization in sequential experimental design. The framework targets dose-finding, spatial field estimation, and ML/engineering tasks, claiming superior sample efficiency and near-linear parallel speedups up to $K=16$ agents. While the modular architecture and ablation analyses are rigorous, all empirical results derive from calibrated surrogate emulators rather than live systems, substantially limiting external validity.
This paper studies nonparametric regression for learning degree-$k_0$ spherical polynomials on the unit sphere $\mathbb{S}^{d-1}$ using over-parameterized two-layer neural networks. The authors propose a novel Gradient Descent with Projection (GDP) algorithm that constrains learning to the top $r_0 = \Theta(d^{k_0})$ eigenspaces of the Neural Tangent Kernel (NTK). The main result establishes a nearly minimax optimal risk bound of order $\log(4/\delta) \cdot \Theta(d^{k_0}/n)$, improving the sample complexity from previous polynomial-in-$1/\varepsilon$ rates to linear $1/\varepsilon$ scaling.
Ctrl-A addresses automated data augmentation by framing it as a control problem, dynamically adjusting per-operation augmentation strengths via a feedback loop that balances training and validation loss ratios. The method introduces Relative Operation Response (ROR) curves to individually tune transformation distributions without manual initialization or expensive search phases. While it achieves competitive results on CIFAR and SVHN benchmarks with minimal computational overhead (~10% vs. TrivialAugment), the evaluation relies on a modified training setup with extended epochs, raising questions about separability of algorithmic gains from training protocol changes.
The paper proposes AV-LR, a lightweight amortized variational inference framework for logistic regression with missing covariates that eliminates latent variables entirely. Unlike VAE-based competitors, it directly models the posterior over missing values using a single neural network coupled with a linear classification layer, enabling joint optimization of imputation and prediction. The approach extends naturally to MNAR settings and claims substantial computational speedups over EM-based methods while maintaining comparable statistical accuracy.
S2tc-bdd addresses Semi-Supervised Text Classification (SSTC) where pseudo-label accuracy suffers from "margin bias" caused by imbalanced label angle variances between classes. The core idea is to balance deep representation distributions by applying Gaussian linear transformations to Angular Margin (AM) loss, thereby eliminating decision boundary bias during self-training. This matters because it targets a fundamental distribution mismatch in SSL that particularly degrades performance when labeled data is scarce.
ROM tackles overthinking in Large Reasoning Models, where models generate redundant reasoning after reaching correct answers. The core idea is a lightweight streaming detector—an 8.13M parameter head attached to late-layer hidden states of a frozen LLM—that predicts overthinking probability token-by-token and triggers early stopping. It matters because it promises 47% token reduction without full model retraining. We find the method empirically effective but note concerns regarding data scaling limits and labeling costs.