Nothing here yet
This paper tackles parameter-efficient multi-task learning (PEFT-MTL), where the challenge is to share parameters across tasks without interference while maintaining the efficiency of methods like LoRA. The core idea is Free Sinewich: it modulates a shared low-rank convolutional adapter (Sine-AWB) using task-specific sinusoidal frequencies generated by a lightweight Clock Net, achieving task specialization without duplicating parameters. This frequency-switching mechanism is inspired by biological oscillatory multiplexing and aims to decorrelate task weights while boosting effective rank.
WinDiNet repurposes the LTX-Video latent diffusion transformer as a fast, differentiable surrogate for urban wind flow simulation, addressing the prohibitive cost of time-resolved CFD in design exploration. By fine-tuning the 2B-parameter video model on 10,000 2D incompressible CFD simulations over procedurally generated building layouts, the authors achieve sub-second generation of 112-frame rollouts while enabling end-to-end gradient-based optimization of building positions for pedestrian wind comfort.
Variational Quantum Classifiers (VQAs) are typically trained in ideal classical simulations, raising concerns about reproducibility on noisy quantum hardware. This paper proposes that the average relative entropy between class distributions combined with transpilation depth predicts noise robustness—introducing the log-DTSAE metric to forecast accuracy degradation without requiring noisy hardware execution. The authors validate this across thousands of models spanning diverse ansatzes, encodings, and simulated backends from IBM, IQM, and IonQ.
This paper establishes a comprehensive benchmark for photoplethysmography (PPG)-based clinical prediction using the large-scale MIMIC-III-Ext-PPG dataset, evaluating multi-task learning across arrhythmia classification (13 classes) and physiological regression (blood pressure, heart rate, respiratory rate). The core contribution is demonstrating robust atrial fibrillation detection (AUROC 0.96) with strong cross-dataset generalizability, alongside the first systematic assessment of fine-grained arrhythmia classification from PPG alone. It matters because PPG sensors are ubiquitous in wearables and ICUs, yet standardized, large-scale, multi-task benchmarks have been lacking, hindering meaningful algorithm comparison and clinical deployment.
Feature incremental clustering addresses dynamic scenarios where data arrives in expanding feature spaces—such as activity recognition systems that acquire new sensors over time. This paper proposes four k-means-based algorithms (FIC-FT, FIC-DR, FIC-DA, FIC-MR) tailored to different data-access constraints, from full historical access to model-only reuse. The core theoretical contribution establishes generalization error bounds for all four settings, revealing that model reuse (FIC-MR) can achieve a fast $\tilde{\mathcal{O}}(1/n_2)$ convergence rate when the pre-trained model aligns well with the current distribution.
Understanding what representations neural networks discard is crucial for trustworthy ML. This paper proposes methods to sample from invariant sets (fibers) of feature extractors: either by regularizing conditional generative models with a fiber loss, or by guiding pretrained diffusion models via non-linear diffusion trajectory matching (NDTM). The training-free NDTM approach reduces setup time from days to minutes, enabling rapid analysis of model blind spots including medical safety concerns.
TIDE is a post-training early exit system for autoregressive LLMs that trains lightweight router MLPs to predict which tokens can safely exit at intermediate layers. The key idea is using cosine similarity between checkpoint hidden states and final layer outputs as a convergence signal, eliminating the need for costly model retraining. Unlike prior early-exit methods that require training from scratch or use unreliable confidence heuristics, TIDE claims to work with any HuggingFace causal LM while preserving KV cache integrity and achieving up to 8.1% throughput improvement.
Developing optimized CUDA kernels is critical for generative AI but remains challenging even for human experts. This paper introduces DRTriton, a framework that trains a 7B-parameter LLM to convert PyTorch code into efficient Triton kernels using exclusively synthetic data. The approach combines a constraint satisfaction algorithm for program generation (CSP-DAG), curriculum reinforcement learning with decoupled rewards (DRPO), and test-time search, achieving 92% speedup on KernelBench Level 2 compared to 23% for GPT-5.2.
GSEM addresses the challenge of building structured experience memory for clinical LLM agents. Unlike flat memory banks storing isolated records, it organizes clinical decisions into a dual-layer graph capturing both internal decision structure (entity layer) and inter-experience relational dependencies (experience layer), supporting applicability-aware retrieval and online feedback-driven calibration of node quality and edge weights. Experiments on medical benchmarks report strong improvements over RAG and memory-augmented baselines, achieving 70.90% average accuracy with DeepSeek-V3.2.
Large Vision-Language Models (LVLMs) suffer from quadratic self-attention costs when processing high-resolution images that generate thousands of visual tokens. ResPrune addresses this by formulating token pruning as a subspace reconstruction problem: it greedily selects tokens that maximize residual energy (the orthogonal component unexplained by the current subset) in the LLM input embedding space. To align selection with user queries, it modulates these residuals by a text relevance score computed via cosine similarity with embedded nouns from the prompt. This yields a training-free, plug-in method that preserves semantic coverage while reducing compute.
SSAM tackles the problem of merging independently trained multimodal large language models (e.g., vision-language and audio-language specialists) into a single model capable of processing arbitrary modality combinations without any paired multimodal training data. The core idea is to project language-specific parameter updates (task vectors) onto a shared low-rank subspace identified via SVD, thereby aligning consistent update directions while filtering conflicting ones before merging. This is significant because it offers a training-free alternative to expensive joint multimodal training, achieving state-of-the-art results on four benchmarks.
This paper presents a large-scale comparative study of memorization across six open LLM families (Pythia, OLMo1/2/3, OpenLLaMA, StarCoder) ranging from 1B to 32B parameters. By analyzing both statistical patterns and internal mechanisms (attention heads, layer decoding), it identifies universal behaviors—such as log-linear scaling of memorization rates with model size and high compressibility of memorized sequences—while revealing family-specific signatures in memorization structure. The work bridges isolated findings from single-model studies to establish general principles of how transformers memorize training data.
This paper proposes a training-free conditional diffusion model for Bayesian filtering in data assimilation. Instead of learning the score function via neural networks, the authors leverage kernel density estimation (KDE) to represent the joint distribution of states and measurements, yielding a closed-form expression for the score that enables analytical sampling from the posterior. The method targets nonlinear, non-Gaussian filtering problems where traditional ensemble Kalman filters (EnKF) make restrictive Gaussian approximations and particle filters suffer from weight degeneracy in small-ensemble regimes.
EvoIdeator addresses the challenge of iteratively refining scientific research ideas using LLMs by bridging the gap between scalar RL rewards and coarse language feedback. The core innovation is a dual-signal approach combining lexicographic rewards with checklist-grounded, span-level language feedback integrated directly into the RL training loop using Dr. GRPO. This allows a 4B parameter model to outperform larger frontier models like Gemini 3 Flash and DeepSeek-V3.2 on scientific rigor criteria.
The paper investigates whether large language models possess genuine "introspective awareness"—the ability to detect and identify concept steering vectors injected into their residual stream—or whether this behavior stems from shallow heuristics. Through behavioral experiments and mechanistic analysis on Gemma3-27B, the authors establish that detection maintains 0% false positives across diverse prompts, emerges specifically from post-training rather than pretraining, and relies on distributed MLP computation involving distinct "evidence carrier" and "gate" features. The work suggests models possess latent introspective capacity that default prompting dramatically under-elicits.
This vision paper from the vLLM Semantic Router project proposes the Workload-Router-Pool (WRP) architecture, a three-dimensional framework for LLM inference optimization. The authors synthesize two dozen prior publications into a structured matrix, arguing that workload characteristics, routing policy, and pool architecture are coupled dimensions that must be co-optimized. The paper maps existing work onto a $3\times3$ interaction matrix and proposes twenty-one concrete research directions tiered by maturity.
The paper studies calibeating—post-processing external forecasts online to minimize cumulative losses while matching an informativeness-based benchmark. Unlike prior work that used loss-specific arguments, the authors reduce calibeating to standard online learning primitives, showing it is minimax-equivalent to regret minimization. This yields optimal rates for general proper losses and improves bounds for simultaneous calibration and calibeating.
JANUS addresses jailbreaking of text-to-image models by reframing the discrete prompt search as optimization over a structured distribution. The framework mixes two Gaussian-anchored prompt distributions—one around the target harmful prompt and one around a sanitized 'clean' version—and uses policy gradient on a single scalar mixing parameter $\alpha$ to maximize end-to-end reward. This avoids both proxy-loss optimization and costly LLM-based generators, achieving substantial efficiency gains while exposing weaknesses in current safety pipelines.
Enterprise AI agents face a fundamental dilemma: complex reasoning demands large-scale training data, yet enterprise domains offer limited, noisy trajectories and prohibit online self-play. This paper proposes Context Engineering via DT-MDP (DT-MDP-CE), a framework that abstracts LLM agent behavior into a finite Digital-Twin Markov Decision Process, learns per-step rewards via contrastive inverse RL (T-REX) from ranked offline trajectories, and deploys the resulting policy to guide context engineering—enabling performance gains without fine-tuning the base model or interacting with the environment during training.
Medical text summarization helps clinicians process millions of biomedical articles, but fine-tuning large language models demands prohibitive resources. This paper compares Low-Rank Adaptation (LoRA), Prompt Tuning, and full fine-tuning across Flan-T5-Small, Base, and Large on PubMed summarization. The counter-intuitive finding is that updating fewer than 1% of parameters via LoRA consistently outperforms full fine-tuning, suggesting that low-rank constraints provide effective regularization.