Nothing here yet
Understanding collective human intent from noisy, conflicting public discourse represents a frontier AI challenge that extends beyond individual instruction-following. This paper introduces COIN-Bench, a live-updating benchmark comprising 200k+ real consumer discussions across 1,400+ products, which operationalizes an Active Probing Paradigm requiring LLMs to act as meta-analysts and reconstruct chaotic feedback into structured questionnaires. The work matters because it shifts evaluation from transactional action prediction to hierarchical consensus synthesis, testing whether models can resolve contradictions and infer latent trends from swarm-like intelligence.
DMMRL tackles molecular property prediction by addressing two key challenges: entangled representations that obscure structure-property relationships and naïve multi-modal fusion that ignores inter-modal dependencies. The method uses variational autoencoders to decompose graph, sequence, and geometry features into shared (structure-relevant) and private (modality-specific) latent subspaces, enforcing orthogonality between them. A gated attention mechanism then fuses only the shared representations for downstream prediction.
Multi-modal tracking suffers from scarce paired training data, forcing reliance on RGB pre-trained models with lightweight fine-tuning. PATrack proposes a progressive adaptation framework using three complementary adapters—Modality-Dependent (MDA), Cross-Modality Entangled (CEA), and Head Adaptation (HA)—to bridge the domain gap between RGB and auxiliary modalities (Thermal, Depth, Event) at the intra-modal, inter-modal, and task levels. The approach decomposes features into frequency bands and uses fusion-guided cross-attention, yielding state-of-the-art results on LasHeR, RGBT234, and VisEvent benchmarks.
DomAgent addresses the challenge of generating code for specialized domains like truck control systems or data science libraries, where generic LLMs often fail due to lack of domain knowledge. The system combines structured knowledge graphs (top-down reasoning) with case-based retrieval (bottom-up learning) through a novel DomRetriever module that iteratively refines context via LLM-based review. Experiments on both the DS-1000 benchmark and a real-world truck software dataset demonstrate substantial improvements, enabling small 7B-8B parameter models to approach or exceed the performance of proprietary systems like GPT-4o.
This paper tackles the challenging problem of b-jet tagging at the LHC, particularly the difficult discrimination between bottom-quark jets (b-jets) and charm-quark jets (c-jets). The authors propose ECT (Edge Convolution Transformer), a hybrid deep learning architecture that combines local feature extraction via EdgeConv blocks with global context modeling through transformer self-attention. The work is motivated by the need for real-time flavor tagging in high-level trigger systems, where both accuracy and inference latency are critical.
Phase unwrapping recovers absolute interferometric phase from wrapped $2\pi$-modulo observations, but fails near surface-breaking faults that create abrupt discontinuities and in large-scale scenes that exceed GPU memory. This work proposes a diffusion-based framework that conditions on SNAPHU estimates and processes large interferograms via overlapping 256$\times$256 tiles with weighted averaging. It claims to handle fault-related phase jumps and scale to real-world Sentinel-1 interferograms without resizing.
GRPO training for LLM reasoning suffers from expensive rollouts and wasted compute on zero-variance prompts where all answers are correct or wrong. This paper proposes Prompt Replay, an overhead-free online method that buffers and reuses medium-difficulty prompts (pass rate near 0.5) to maximize gradient signal while staying on-policy by regenerating responses. By mixing replayed prompts with fresh samples and controlling reuse via cooldown steps and caps, the method aims to accelerate early training, though it eventually plateaus to baseline performance.
This paper tackles the lack of explicit memory mechanisms in transformers by introducing Mixture of Chapters (MoC)—a learned bank of 262K latent memory tokens accessed via cross-attention. To scale memory without prohibitive costs, the authors partition the bank into chapters and route each input sequence to a sparse subset (top-64), reducing complexity from $O(L \cdot N_m)$ to $O(L \cdot k \cdot T)$. The work demonstrates that explicit associative memory can serve as a new axis of scaling, showing improved knowledge retention when transitioning from pretraining to instruction fine-tuning.
LLMs with chain-of-thought reasoning can perform hidden internal computations across filler tokens, threatening AI safety by enabling obfuscated reasoning. This paper proposes an early-exit transformer architecture that trains models to truncate forward passes at intermediate layers when tokens are predictable, aiming to force reasoning into externalized CoT rather than internal activations. The approach uses self-distillation to calibrate exit probabilities followed by RL with a layer-depth penalty, showing on small models that adaptive depth reduction can maintain task performance while reducing computation.
This paper addresses reward hacking in reward-centric diffusion reinforcement learning (RDRL), where diffusion models exploit non-robust reward models to achieve high scores without actual perceptual quality improvements. The authors propose RSA-FT (Reward Sharpness-Aware Fine-Tuning), which mitigates hacking by flattening the reward landscape through joint perturbations in both image space (adversarial training) and parameter space (Sharpness-Aware Minimization). The method is plug-and-play, compatible with existing RDRL frameworks like ReFL and DRaFT, and shows consistent gains across SD1.5, SDXL, SD3, and Flux backbones.
Thyroid ultrasound reporting requires joint assessment of nodule boundaries and TI-RADS risk categories, yet annotator variability creates inconsistent supervision that destabilizes standard multitask learning. This paper proposes RLAR (Representation-Level Adversarial Regularization), which uses normalized adversarial directions in latent space as geometric probes of task sensitivity and penalizes excessive angular alignment between task gradients to control negative transfer. Combined with a clinically guided embedding that distills TI-RADS-aligned radiomics targets during training, the framework aims to stabilize joint segmentation and classification while grounding predictions in interpretable evidence.
This paper reframes plasticity loss in deep reinforcement learning as an optimization pathology rather than capacity degradation. The core claim—dubbed the Optimization-Centric Plasticity (OCP) hypothesis—is that parameters become trapped in local optima from previous tasks, which then become poor optima for new tasks. The authors prove that neuron dormancy is mathematically equivalent to zero-gradient states and show that plasticity recovers when tasks differ sufficiently, suggesting networks retain capacity but lose it to task-specific optimization landscapes.
Designing high-performance system heuristics traditionally requires human experts to navigate multi-step conceptual shifts. This paper introduces Engram, an agentic architecture that sidesteps the 'coherence ceiling' of single-context LLM agents and the 'evolutionary neighborhood bias' of code-mutation systems by decoupling long-horizon exploration into sequential agent handoffs. Each agent distills findings into a persistent Research Digest, enabling cumulative progress without context degradation.
Standard AlphaZero-style tree search for LLM reasoning suffers from a scaling failure: on GSM8K and Game24, accuracy actually drops as the search budget increases beyond moderate levels. This paper introduces ReSCALE, which replaces PUCT selection and Dirichlet noise with Gumbel sampling and Sequential Halving—a best-arm identification technique from multi-armed bandits. The key insight is that root action-selection design is critical for budget-scalable reasoning without any changes to the model or its training.
ViCLSR adapts supervised contrastive learning (SimCSE-style) to Vietnamese NLU by converting NLI entailment and contradiction pairs into positive and negative training signals. Built on XLM-R Large (550M), the framework improves sentence embeddings for low-resource Vietnamese, reporting gains of +6.97% F1 over PhoBERT on ViNLI and state-of-the-art results across five downstream tasks including fact-checking and machine reading comprehension.
This paper investigates whether neural text-to-speech systems capture consonant-induced F0 perturbation—fine-grained phonetic effects where voiceless obstruents raise and voiced obstruents lower fundamental frequency relative to sonorants. The authors propose a segmental-level prosodic probing framework comparing Tacotron 2 and FastSpeech 2 against natural speech, stratifying by lexical frequency to test memorization versus abstraction. This matters because TTS evaluation often misses sub-phonemic articulatory detail that distinguishes human-like phonetic competence from surface pattern matching.
BadGraph proposes the first universal adversarial attack on text-attributed graphs (TAGs) that works across both GNN and LLM backbones. The core idea: use an LLM agent to perturb topology and text jointly, creating 'cross-modal shortcuts' to mislead models without gradient access. This matters because TAG security is understudied and existing attacks fail when models use rich text encoders like SBERT or TAPE.
Forward-looking sonar images suffer from severe speckle noise, acoustic shadows, and energy attenuation that break standard semi-supervised teacher-student frameworks. This paper proposes CTFS, a collaborative multi-teacher architecture where one general teacher and two sonar-specific teachers (simulating acoustic shadows and energy decay) alternate to guide a student model. A cross-teacher reliability assessment mechanism filters noisy pseudo-labels by measuring prediction consistency across teacher views. The work matters because sonar annotation is expensive and existing methods fail with <10% labels due to domain mismatch.
Abductive reasoning—inferring the most probable hypothesis from incomplete observations—remains a critical gap for LLMs despite advances in deductive and inductive tasks. This paper introduces Graph of States (GoS), a neuro-symbolic framework that structures multi-agent collaboration through a causal graph (encoding belief states) and a state machine (governing navigation). By grounding reasoning in explicit symbolic constraints rather than unstructured context, GoS aims to eliminate Evidence Fabrication, Context Drift, Failed Backtracking, and Early Stopping that plague Chain-of-Thought and Tree-of-Thought when adapted to dynamic, non-monotonic abductive tasks like medical diagnosis and distributed systems failure analysis.
ConsRoute tackles the challenge of routing queries across cloud-edge-device LLM tiers by proposing a consistency-aware approach that uses reranker-based semantic similarity rather than scalar quality gaps. The core innovation lies in reusing device-side LLM (DLM) prefilling hidden states as query representations and applying cluster-specific adaptive thresholds learned via Bayesian optimization. This addresses the tension between response quality and latency/cost in resource-constrained mobile environments, claiming to achieve ≥95% of cloud accuracy while cutting latency and cost by ~40%.