CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
Most Time Series Foundation Models treat channels independently and ignore cross-channel correlations, which limits their performance on multivariate forecasting. This paper proposes CoRA (CoRrelation-aware Adapter), a lightweight plug-in that learns three correlation types—dynamic (time-varying), heterogeneous (positive/negative), and partial (sparse)—through a low-rank decomposition and dual contrastive learning. The key insight is that these correlations can be captured during fine-tuning without re-pretraining the foundation model, and with only linear complexity at inference time.
The paper presents a well-engineered adapter that convincingly improves few-shot forecasting across six different foundation models on ten benchmarks. The decomposition of correlation matrices into Time-Varying and Time-Invariant low-rank components (Equation 5) and the use of learnable polynomials for dynamic patterns are technically sound. The Heterogeneous-Partial Contrastive Learning framework effectively avoids the $O(N^2)$ complexity during inference by moving the heavy computation to training only. However, the tripartite taxonomy of correlations (DCorr, HCorr, PCorr) feels somewhat constructed for narrative convenience rather than emerging organically from the data, and the theoretical guarantees (Theorems 1-2) rely on strong local stationarity assumptions that are not validated empirically.
The low-rank factorization $Q_t V Q_t^T$ (with $Q_t \in \mathbb{R}^{N \times M}$, $M < N$) reduces parameters while preserving expressiveness, and Theorem 1 proves functional equivalence to additive decomposition. The empirical results are robust: CoRA consistently outperforms baselines across all datasets in 5% few-shot settings (Table 1), and ablations confirm that removing either DCE or HPCL degrades performance (Table 2). Most importantly, the inference complexity is strictly $O(N)$, making the adapter practical for high-channel scenarios like Electricity ($N=321$).
First, the three correlation types (DCorr, HCorr, PCorr) are defined mathematically in Appendix A but their mutual exclusivity and practical identifiability are not demonstrated. Second, Theorem 2's error bound for polynomials assumes the underlying correlation is a smooth function of the basis $q$, which may not hold for abrupt regime changes or non-stationary real-world data. Third, the HPCL loss (Equation 11) relies on the estimated correlation matrix $M_t^{corr}$ to define positive/negative pairs; if DCE estimates are noisy, the contrastive signal may be incorrect, yet no analysis of error propagation is provided. Finally, the 'learnable threshold' $\epsilon$ (Equation 10) lacks implementation details—how is it optimized and does it remain stable?
The experimental comparison is comprehensive across 10 datasets and 6 backbones (GPT4TS, UniTime, Timer, etc.), showing consistent MSE/MAE reductions. The comparison with LIFT and C-LoRA (Figure 4) demonstrates CoRA's advantage in few-shot settings, though C-LoRA is designed for end-to-end training rather than fine-tuning TSFMs, making the comparison slightly asymmetric. Notably, the TTM experiment shows that a channel-independent TTM + CoRA outperforms channel-dependent TTM without CoRA, suggesting the adapter successfully recovers cross-channel information. However, the paper omits comparison to recent end-to-end graph-based forecasters (e.g., DGCformer, MSGNet) which might outperform TSFM+CoRA when trained from scratch with full data.
The authors provide code at https://github.com/decisionintelligence/CoRA and use standard datasets (ETT, Electricity, Traffic, etc.) from TSFM-Bench. Implementation details including polynomial degree $K=1$, rank $M=N/4$, and projection layers $l_1=1, l_2=1$ are specified in Appendix F. However, critical hyperparameters like the temperature $\tau$ in contrastive loss and the initialization/learning rate for the threshold $\epsilon$ are not reported. The complexity analysis (Appendix B) is thorough, distinguishing between training ($O(N^2)$) and inference ($O(N)$) costs. Reproduction should be feasible for researchers with GPU resources, though the exact training time for the largest dataset (Electricity, $N=321$) is not specified.
Most existing Time Series Foundation Models (TSFMs) use channel independent modeling and focus on capturing and generalizing temporal dependencies, while neglecting the correlations among channels or overlooking the different aspects of correlations. However, these correlations play a vital role in Multivariate time series forecasting. To address this, we propose a CoRrelation-aware Adapter (CoRA), a lightweight plug-and-play method that requires only fine-tuning with TSFMs and is able to capture different types of correlations, so as to improve forecast performance. Specifically, to reduce complexity, we innovatively decompose the correlation matrix into low-rank Time-Varying and Time-Invariant components. For the Time-Varying component, we further design learnable polynomials to learn dynamic correlations by capturing trends or periodic patterns. To learn positive and negative correlations that appear only among some channels, we introduce a novel dual contrastive learning method that identifies correlations through projection layers, regulated by a Heterogeneous-Partial contrastive loss during training, without introducing additional complexity in the inference stage. Extensive experiments on 10 real-world datasets demonstrate that CoRA can improve TSFMs in multivariate forecasting performance.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.