Nothing here yet
Symbolic regression search spaces suffer from structural redundancy: expression DAGs with $k$ internal nodes admit $\Theta(k!)$ distinct node-numberings that encode the same mathematical expression. This paper proposes IsalSR, a representation framework that computes a pruned canonical string—a complete labeled-DAG isomorphism invariant—to collapse all equivalent forms into a single canonical representation. The approach promises to reduce effective search space size by $O(k!)$ and can be integrated into any existing SR algorithm as a preprocessing step.
GSEM addresses the challenge of building structured experience memory for clinical LLM agents. Unlike flat memory banks storing isolated records, it organizes clinical decisions into a dual-layer graph capturing both internal decision structure (entity layer) and inter-experience relational dependencies (experience layer), supporting applicability-aware retrieval and online feedback-driven calibration of node quality and edge weights. Experiments on medical benchmarks report strong improvements over RAG and memory-augmented baselines, achieving 70.90% average accuracy with DeepSeek-V3.2.
EvoIdeator addresses the challenge of iteratively refining scientific research ideas using LLMs by bridging the gap between scalar RL rewards and coarse language feedback. The core innovation is a dual-signal approach combining lexicographic rewards with checklist-grounded, span-level language feedback integrated directly into the RL training loop using Dr. GRPO. This allows a 4B parameter model to outperform larger frontier models like Gemini 3 Flash and DeepSeek-V3.2 on scientific rigor criteria.
The paper studies calibeating—post-processing external forecasts online to minimize cumulative losses while matching an informativeness-based benchmark. Unlike prior work that used loss-specific arguments, the authors reduce calibeating to standard online learning primitives, showing it is minimax-equivalent to regret minimization. This yields optimal rates for general proper losses and improves bounds for simultaneous calibration and calibeating.
Enterprise AI agents face a fundamental dilemma: complex reasoning demands large-scale training data, yet enterprise domains offer limited, noisy trajectories and prohibit online self-play. This paper proposes Context Engineering via DT-MDP (DT-MDP-CE), a framework that abstracts LLM agent behavior into a finite Digital-Twin Markov Decision Process, learns per-step rewards via contrastive inverse RL (T-REX) from ranked offline trajectories, and deploys the resulting policy to guide context engineering—enabling performance gains without fine-tuning the base model or interacting with the environment during training.
Medical text summarization helps clinicians process millions of biomedical articles, but fine-tuning large language models demands prohibitive resources. This paper compares Low-Rank Adaptation (LoRA), Prompt Tuning, and full fine-tuning across Flan-T5-Small, Base, and Large on PubMed summarization. The counter-intuitive finding is that updating fewer than 1% of parameters via LoRA consistently outperforms full fine-tuning, suggesting that low-rank constraints provide effective regularization.
Most Time Series Foundation Models treat channels independently and ignore cross-channel correlations, which limits their performance on multivariate forecasting. This paper proposes CoRA (CoRrelation-aware Adapter), a lightweight plug-in that learns three correlation types—dynamic (time-varying), heterogeneous (positive/negative), and partial (sparse)—through a low-rank decomposition and dual contrastive learning. The key insight is that these correlations can be captured during fine-tuning without re-pretraining the foundation model, and with only linear complexity at inference time.
This paper tackles precipitation nowcasting by enhancing the lightweight SmaAT-UNet architecture with two modifications: a vector-quantization (VQ) bottleneck that discretizes latent representations into a learned codebook, and Mixed Convolution (MixConv) blocks that blend multiple kernel sizes to reduce parameters. The goal is to cut model size for edge deployment while preserving forecast skill at a 30-minute lead time.
Suiren-1.0 introduces a family of molecular foundation models designed to bridge the gap between microscopic 3D quantum-mechanical conformations and macroscopic 2D molecular property prediction. The framework comprises Suiren-Base (a 1.8B-parameter SE(3)-equivariant GNN pre-trained on 70M DFT samples), Suiren-Dimer (continued pre-training on intermolecular interactions), and Suiren-ConfAvg (a lightweight 2D model distilled via a novel Conformation Compression Distillation diffusion framework). This work matters because it attempts to unify quantum-accurate representations with practical cheminformatics workflows where only SMILES or graph inputs are available.
Directional abliteration removes refusal behavior from language models by projecting refusal-mediating directions out of weight matrices, where these directions are extracted by contrasting harmful against harmless prompt activations. This paper investigates whether topically matching the harmless baseline to harmful prompts — using, for example, defensive cybersecurity prompts to contrast against hacking prompts — yields cleaner refusal directions than the standard practice of using general-purpose harmless prompts. The central finding is that topic-matched contrast completely fails to produce functional refusal directions while unmatched baselines succeed, because matched subtraction cancels the dominant topic component shared between prompts of the same subject, leaving residue too small to perturb the residual stream.
This paper addresses multimodal survival analysis for clinical data, integrating pathology text, tabular covariates, and gene expression using locally deployable LLMs. The core innovation is a teacher-student distillation framework that trains a compact 1.5B parameter causal LLM to jointly produce calibrated survival curves and concise prognosis explanations. This matters because cloud-hosted medical AI raises privacy concerns, yet heavyweight local models are impractical for many institutions.
P^2O tackles a critical bottleneck in Reinforcement Learning with Verifiable Rewards (RLVR): hard samples with near-zero success rates yield vanishing gradients, effectively starving the model of supervision signals. The solution synergizes policy optimization with evolutionary prompt optimization (GEPA), using optimized prompts to discover successful trajectories for hard samples, then distilling these capabilities into model parameters via context distillation to avoid inference-time dependencies. Experiments on mathematical reasoning benchmarks demonstrate significant gains over GRPO baselines, particularly on challenging AIME problems (+12.3% avg.).
Diffusion language models (DLMs) enable parallel token generation, but their efficiency depends critically on the decoding strategy that determines which tokens to unmask and when. This paper investigates confidence-based decoding—specifically an entropy sum strategy that adaptively batches tokens until cumulative prediction uncertainty exceeds a threshold—and proves it achieves $\varepsilon$-accurate sampling in KL divergence with expected iteration complexity $\widetilde{O}(H(X_0)/\varepsilon)$. When the data distribution has low entropy ($H(X_0) \ll L$), this yields sublinear complexity in sequence length, providing the first theoretical foundation for why confidence-based methods accelerate sampling without sacrificing fidelity.
CurvZO tackles the memory wall problem in LLM fine-tuning by proposing a zeroth-order optimization method that tracks curvature signals online from scalar feedback instead of requiring pre-computed statistics. The core idea uses curvature-aware importance sampling to select which parameters to perturb in sparse ZO updates, coupled with an adaptive budget mechanism that adjusts sparsity based on the evolving curvature distribution. This matters because existing sparse ZO methods either rely on costly pre-computed Fisher information or use static/random sparsity patterns that may be suboptimal.
This paper addresses vision-only UAV navigation in GNSS-denied environments by moving beyond the standard "matching-to-tile" (M2T) paradigm. Instead of retrieving discrete satellite tiles, the proposed Bearing-UAV method jointly regresses continuous position and heading from four neighboring satellite tiles and a UAV view patch, enabling sub-tile localization accuracy while maintaining a lightweight model. The work also introduces Bearing-UAV-90K, a multi-city dataset with heading annotations designed for unaligned cross-view scenarios.
This paper introduces ChronoCon, a self-supervised method that repurposes Rank-N-Contrast learning to use temporal ordering of longitudinal medical scans instead of expert severity labels. By assuming monotonic progression in irreversible diseases, the method learns progression-aware representations from routinely archived clinical metadata. The core finding is that under few-shot scenarios—using labels from only 5 patients—the model achieves an ICC of 86% for disease severity prediction on rheumatoid arthritis radiographs, potentially reducing reliance on costly expert annotations.
BadminSense is a smartwatch-based system for fine-grained badminton stroke evaluation that aims to provide amateur players with professional-quality coaching feedback without requiring expensive external equipment. The system uses a single commercial smartwatch on the dominant wrist to segment and classify four stroke types, predict stroke quality on a 5-point Likert scale, and estimate shuttle impact location on the racket string area. The key innovation is enabling fine-grained quality assessment beyond simple activity recognition, targeting the gap between basic fitness tracking and professional coaching.
This paper tackles the instability of Group Relative Policy Optimization (GRPO) when applied to video generation. The core problem is that converting deterministic ODE samplers to SDE for exploration injects excess noise in high-noise regimes, causing off-manifold drift that degrades rollout quality and destabilizes reward updates. SAGE-GRPO introduces a precise SDE with logarithmic curvature correction to keep exploration closer to the flow trajectory, plus a Dual Trust Region mechanism combining periodic moving anchors with stepwise KL constraints to prevent long-horizon drift. The method is evaluated on HunyuanVideo1.5 using VideoAlign rewards, showing improvements over DanceGRPO, FlowGRPO, and CPS.
Generative recommender systems like TIGER excel at semantic retrieval but ignore the economic realities of monetization via sponsored content. This paper proposes GEM-Rec, a unified framework that augments semantic IDs with control tokens (<ORG>, <AD>) to factorize slot allocation from item generation, and introduces Bid-Aware Decoding to inject real-time auction bids into inference. The work bridges the gap between generative recommendation and computational advertising, offering theoretical guarantees like allocative monotonicity while allowing dynamic trade-offs between user relevance and platform revenue.
Hyperbolic Vision-Language Models (VLMs) improve hierarchical structure preservation over Euclidean counterparts, yet existing approaches treat all part-whole relationships as equally informative. This paper proposes UNCHA (UNcertainty-guided Compositional Hyperbolic Alignment), which leverages the hyperbolic radius as an uncertainty measure to quantify the varying semantic representativeness of image parts to the whole scene. By incorporating this uncertainty into adaptive temperature scaling for contrastive learning and an entropy-regularized entailment loss, UNCHA achieves state-of-the-art performance on zero-shot classification, retrieval, and fine-grained compositional benchmarks, demonstrating that modeling heterogeneous part-whole strength is critical for complex multi-object understanding.