Nothing here yet
Modern AI services increasingly run across the computing continuum—from cloud to edge devices—yet fault management remains challenging due to resource constraints, noisy telemetry, and cascading failures. This paper proposes NeSy-Edge, a three-layer neuro-symbolic framework that performs local log parsing, causal graph construction, and root-cause analysis on edge nodes, invoking cloud LLMs only when local evidence is insufficient. The core idea is to combine lightweight symbolic caching and prior-constrained causal discovery with selective neural inference, trading off autonomy against accuracy under strict memory budgets ($\sim$1500 MB).
ORACLE addresses the problem of verifying intermediate reasoning steps in synthetic LLM training data, where filtering by final answer correctness often preserves spurious reasoning paths. The method combines a structured syllogistic template (<QUERY>, <FACTS>, <RULE>, <REVISION>) with a symbolic reasoning engine (Pyke) to validate steps during beam search, generating preference data for DPO. This hybrid approach matters because it attempts to bring formal verification to natural language reasoning tasks where code execution and pure LLM evaluation fall short.
This paper tackles Practical Test-Time Adaptation (PTTA), where models must adapt to temporally correlated, non-i.i.d. test streams without source data. Unlike prior work that stores samples in a single pool, the authors propose Multi-Cluster Memory (MCM)—organizing memory into multiple clusters based on pixel-level descriptors. The core insight, validated via Gaussian Mixture Model analysis, is that PTTA streams are inherently multi-modal (optimal K* ≈ 6–10), making single-cluster memory structurally mismatched. MCM introduces descriptor-based assignment, Adjacent Cluster Consolidation (ACC), and Uniform Cluster Retrieval (UCR), achieving consistent gains up to 12.13% on DomainNet.
LongCat-Flash-Prover is a 560B-parameter MoE open-source model targeting native formal reasoning in Lean4. The core innovation is decomposing formal theorem proving into three agentic capabilities—auto-formalization, sketching, and proving—trained via a Hybrid-Experts Iteration Framework and a novel RL algorithm called HisPO. The work claims state-of-the-art results on MiniF2F-Test (97.1%), ProverBench (70.8%), and PutnamBench (41.5%) with remarkably low inference budgets compared to prior open-source provers.
This paper tackles multimodal misinformation detection by distinguishing between harmful and harmless visual content manipulation—a nuance often overlooked by existing methods. The authors propose Havc-m4d, a framework that extracts manipulation and intention features using weakly-supervised positive-unlabeled (PU) learning to overcome the lack of ground-truth manipulation labels. By treating real articles with manipulated visuals as likely harmless and fake articles as potentially harmful, the method introduces intention-aware cues that consistently improve detection across four benchmark datasets.
This paper addresses temporal action localization (TAL) for distracted driver behaviors in untrimmed in-cabin videos, a critical task for intelligent transportation systems. The authors propose a two-stage framework combining VideoMAE-based feature extraction with an Augmented Self-Mask Attention (AMA) detector enhanced by a Spatial Pyramid Pooling-Fast (SPPF) module for multi-scale temporal modeling. The work targets deployment scenarios such as fleet management and transportation safety checkpoints, aiming to balance accuracy against computational constraints.
UAV vision-and-language navigation suffers from a structural mismatch between 2D visual perception and 3D trajectory decision-making. SpatialFly bridges this gap via a geometry-guided 2D representation alignment mechanism (G2RA) that injects implicit 3D geometric priors from a pretrained geometry encoder into 2D semantic tokens without explicit 3D reconstruction. Operating on RGB-only observations, the method outperforms state-of-the-art baselines on the OpenUAV benchmark, reducing navigation error by over 4 meters in unseen environments.
LPNSR tackles the efficiency-quality trade-off in diffusion-based image super-resolution, specifically improving upon the 4-step ResShift framework. The core idea is to replace random Gaussian noise in intermediate diffusion steps with an LR-guided noise predictor that approximates a theoretically derived optimal noise, while also replacing bicubic upsampling with a pretrained regression network for better initialization. The method achieves strong perceptual results without relying on large-scale text-to-image priors.
KLDrive addresses fine-grained 3D scene question answering for autonomous driving by coupling an energy-based model for reliable scene knowledge graph construction with a frozen LLM agent that reasons over a constrained symbolic action space. The core insight is that decoupling noisy perception (handled by an EBM that refines multi-source camera and LiDAR detections) from interpretable reasoning (handled by a tool-using LLM with explicit Plan-Execute-Observe loops) substantially reduces hallucinations. The system achieves 65.04\% accuracy on NuScenes-QA and a 46.01 percentage point improvement on counting tasks over prior state-of-the-art, without task-specific fine-tuning of the LLM backbone.
This paper addresses selection bias (position and label bias) in large language models during discrete-choice tasks like multiple-choice questions and pairwise evaluation. The authors propose Permutation-Aware GRPO (PA-GRPO), which extends Group Relative Policy Optimization by treating different permutations of the same question as a single training group rather than independent instances. The method enforces semantic consistency across permutations through two mechanisms: a cross-permutation advantage that computes rewards relative to the group mean, and a consistency-aware reward that penalizes disagreement across permutations. Experiments across seven benchmarks and three models (Llama-3.1-8B, Qwen3-8B, Qwen3-32B) demonstrate that PA-GRPO reduces selection bias while maintaining accuracy.
This paper addresses the challenge of "intelligent disobedience" in shared autonomy — when assistive AI must override human commands to prevent harm but remain helpful. The authors formalize this as the Intelligent Disobedience Game (IDG), a sequential Stackelberg game where a human leader proposes actions and an assistive follower with superior environmental awareness decides whether to obey or intervene. The framework aims to provide the mathematical foundations for training safety-critical assistive systems.
This paper addresses the problem of forecasting outlier events far in advance in time series data, rather than merely detecting immediate anomalies. The authors propose a two-layer framework that first computes outlier scores using standard detection methods, then models the temporal structure of these scores to predict future anomalies. By assuming that outlier occurrences exhibit temporal patterns (e.g., periodicity or delayed dependencies), the method aims to forecast outlier likelihoods without requiring future observations.
This paper investigates why compressing different weight matrices in transformers leads to wildly different outcomes—from negligible impact to 20,000× perplexity increases. The authors map this structural sensitivity across five architectures, revealing that early-layer MLP up-projections are catastrophically fragile while value projections are nearly free to compress. Using Lyapunov stability theory, they explain how residual connections contract errors, and they provide machine-checked formal bounds in Lean 4 to guarantee per-matrix approximation quality.
This paper introduces ECI (Effective Contrastive Information), a training-free metric for evaluating hard-negative mining strategies in dense retrieval. The core idea is to leverage the logarithmic InfoNCE bound on mutual information combined with a harmonic mean of signal (hardness) and safety (margin) to predict downstream retrieval quality without expensive fine-tuning. The proposed metric addresses a real pain point in retrieval research: practitioners currently must run end-to-end ablation studies to evaluate negative sampling strategies, which is computationally wasteful.
This paper proposes automating the entire cognitive science discovery pipeline—experiment design, behavioral data simulation via foundation models, model synthesis through LLM program generation, and iterative refinement via an "interestingness" critic—to overcome the slow pace and bias of manual research. The vision is a high-throughput in-silico engine that searches vast algorithmic and experimental spaces to surface theoretically informative mechanisms for human validation.
AutoMOOSE introduces a multi-agent AI framework to automate the full lifecycle of phase-field simulations in MOOSE, from natural-language prompts to quantitative kinetics analysis. The system orchestrates five specialized agents that generate syntactically valid input files, execute parallel parameter sweeps, autonomously recover from convergence failures, and verify physical consistency through Arrhenius analysis. Validated on copper grain growth, it demonstrates that LLM-driven orchestration can bridge the gap between scientific intent and executable multiphysics simulations, yielding results statistically comparable to expert-authored workflows.
NVIDIA introduces Nemotron 3, a family of open language models (Nano, Super, Ultra) built on a hybrid Mamba-Transformer MoE architecture. The core innovation is using selective attention layers combined with Mamba-2 state space layers to achieve high throughput while maintaining accuracy. Key technical contributions include LatentMoE (dimensionality-reduced expert routing), NVFP4 training for efficiency, and multi-environment RL post-training. The paper positions these models as optimized for agentic AI with up to 1M token contexts and granular inference-time reasoning budget control.