CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter

cs.LG cs.AI Hanyin Cheng, Xingjian Wu, Yang Shu, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo · Mar 23, 2026
Local to this browser
What it does
Most Time Series Foundation Models treat channels independently and ignore cross-channel correlations, which limits their performance on multivariate forecasting. This paper proposes CoRA (CoRrelation-aware Adapter), a lightweight plug-in...
Why it matters
This paper proposes CoRA (CoRrelation-aware Adapter), a lightweight plug-in that learns three correlation types—dynamic (time-varying), heterogeneous (positive/negative), and partial (sparse)—through a low-rank decomposition and dual...
Main concern
The paper presents a well-engineered adapter that convincingly improves few-shot forecasting across six different foundation models on ten benchmarks. The decomposition of correlation matrices into Time-Varying and Time-Invariant low-rank...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Most Time Series Foundation Models treat channels independently and ignore cross-channel correlations, which limits their performance on multivariate forecasting. This paper proposes CoRA (CoRrelation-aware Adapter), a lightweight plug-in that learns three correlation types—dynamic (time-varying), heterogeneous (positive/negative), and partial (sparse)—through a low-rank decomposition and dual contrastive learning. The key insight is that these correlations can be captured during fine-tuning without re-pretraining the foundation model, and with only linear complexity at inference time.

Critical review
Verdict
Bottom line

The paper presents a well-engineered adapter that convincingly improves few-shot forecasting across six different foundation models on ten benchmarks. The decomposition of correlation matrices into Time-Varying and Time-Invariant low-rank components (Equation 5) and the use of learnable polynomials for dynamic patterns are technically sound. The Heterogeneous-Partial Contrastive Learning framework effectively avoids the $O(N^2)$ complexity during inference by moving the heavy computation to training only. However, the tripartite taxonomy of correlations (DCorr, HCorr, PCorr) feels somewhat constructed for narrative convenience rather than emerging organically from the data, and the theoretical guarantees (Theorems 1-2) rely on strong local stationarity assumptions that are not validated empirically.

“\bm{M}_{t}^{\text{corr}}=\bm{R}+\bm{Q}_{t}\bm{V}{\bm{Q}_{t}}^{T}”
paper · Equation 5
“Definition 2 (Dynamic Correlation)... Definition 3 (Heterogeneous Correlation)... Definition 4 (Partial Correlation)”
paper · Appendix A
What holds up

The low-rank factorization $Q_t V Q_t^T$ (with $Q_t \in \mathbb{R}^{N \times M}$, $M < N$) reduces parameters while preserving expressiveness, and Theorem 1 proves functional equivalence to additive decomposition. The empirical results are robust: CoRA consistently outperforms baselines across all datasets in 5% few-shot settings (Table 1), and ablations confirm that removing either DCE or HPCL degrades performance (Table 2). Most importantly, the inference complexity is strictly $O(N)$, making the adapter practical for high-channel scenarios like Electricity ($N=321$).

“In the inference phase, since CoRA only includes HD modules, the time complexity is $\mathcal{O}(N)$.”
paper · Section 4.5
“Compared to fine-tuning without CoRA, fine-tuning with CoRA achieves better results in average results... for both LLM-based models and time series pre-trained models”
paper · Table 1
“$\bm{Q}_{t}\bm{V}\bm{Q}_{t}^{T}$ can be expressed as the sum of a time-invariant matrix $\bm{M}_{i}$ and a time-varying matrix $\bm{M}_{v}$”
paper · Theorem 1
Main concerns

First, the three correlation types (DCorr, HCorr, PCorr) are defined mathematically in Appendix A but their mutual exclusivity and practical identifiability are not demonstrated. Second, Theorem 2's error bound for polynomials assumes the underlying correlation is a smooth function of the basis $q$, which may not hold for abrupt regime changes or non-stationary real-world data. Third, the HPCL loss (Equation 11) relies on the estimated correlation matrix $M_t^{corr}$ to define positive/negative pairs; if DCE estimates are noisy, the contrastive signal may be incorrect, yet no analysis of error propagation is provided. Finally, the 'learnable threshold' $\epsilon$ (Equation 10) lacks implementation details—how is it optimized and does it remain stable?

“$\mathcal{L}_{\text{pos}}=-\frac{1}{N}\sum_{i=1}^{N}log(\frac{\sum_{j=1}^{N}\bm{M}_{t}^{\text{pos}}[i,j]\text{exp}(\text{sim}(\bm{\tilde{X}}_{t}^{\text{pos}}[i],\bm{\tilde{X}}_{t}^{\text{pos}}[j])/\tau)}{\sum_{k=1}^{N}\text{exp}(\text{sim}(\bm{\tilde{X}}_{t}^{\text{pos}}[i],\bm{\tilde{X}}_{t}^{\text{pos}}[k])/\tau)})$”
paper · Equation 11
“$\bm{M}_{t}^{\text{pos}}=\begin{cases}m_{t}^{\text{corr}},&\text{if }corr>\epsilon\\ 0,&else\end{cases}$”
paper · Equation 10
Evidence and comparison

The experimental comparison is comprehensive across 10 datasets and 6 backbones (GPT4TS, UniTime, Timer, etc.), showing consistent MSE/MAE reductions. The comparison with LIFT and C-LoRA (Figure 4) demonstrates CoRA's advantage in few-shot settings, though C-LoRA is designed for end-to-end training rather than fine-tuning TSFMs, making the comparison slightly asymmetric. Notably, the TTM experiment shows that a channel-independent TTM + CoRA outperforms channel-dependent TTM without CoRA, suggesting the adapter successfully recovers cross-channel information. However, the paper omits comparison to recent end-to-end graph-based forecasters (e.g., DGCformer, MSGNet) which might outperform TSFM+CoRA when trained from scratch with full data.

“TTM's Channel-Dependent (CD) and Channel-Independent (CI) versions... The better performance of the former [CI + CoRA] demonstrates that considering the mentioned three types of correlations allows the model to better understand the inter-channels interaction.”
paper · Section 6.2
“Since LIFT and C-LoRA are not specifically designed for TSFMs, the limited training samples in the few-shot setting lead to a degradation in their performance”
paper · Section 6.3
Reproducibility

The authors provide code at https://github.com/decisionintelligence/CoRA and use standard datasets (ETT, Electricity, Traffic, etc.) from TSFM-Bench. Implementation details including polynomial degree $K=1$, rank $M=N/4$, and projection layers $l_1=1, l_2=1$ are specified in Appendix F. However, critical hyperparameters like the temperature $\tau$ in contrastive loss and the initialization/learning rate for the threshold $\epsilon$ are not reported. The complexity analysis (Appendix B) is thorough, distinguishing between training ($O(N^2)$) and inference ($O(N)$) costs. Reproduction should be feasible for researchers with GPU resources, though the exact training time for the largest dataset (Electricity, $N=321$) is not specified.

“The Prarmeter Sensitivity analyses for the polynomial's degree $K$, the decomposition size $M$, and the number of projection layers $l_1,l_2$ are presented in Appendix F.2”
paper · Appendix F.2
“We have released our model code at: https://github.com/decisionintelligence/CoRA”
paper · Reproducibility statement
Abstract

Most existing Time Series Foundation Models (TSFMs) use channel independent modeling and focus on capturing and generalizing temporal dependencies, while neglecting the correlations among channels or overlooking the different aspects of correlations. However, these correlations play a vital role in Multivariate time series forecasting. To address this, we propose a CoRrelation-aware Adapter (CoRA), a lightweight plug-and-play method that requires only fine-tuning with TSFMs and is able to capture different types of correlations, so as to improve forecast performance. Specifically, to reduce complexity, we innovatively decompose the correlation matrix into low-rank Time-Varying and Time-Invariant components. For the Time-Varying component, we further design learnable polynomials to learn dynamic correlations by capturing trends or periodic patterns. To learn positive and negative correlations that appear only among some channels, we introduce a novel dual contrastive learning method that identifies correlations through projection layers, regulated by a Heterogeneous-Partial contrastive loss during training, without introducing additional complexity in the inference stage. Extensive experiments on 10 real-world datasets demonstrate that CoRA can improve TSFMs in multivariate forecasting performance.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.