MAGPI: Multifidelity-Augmented Gaussian Process Inputs for Surrogate Modeling from Scarce Data

stat.ML cs.LG Atticus Rex, Elizabeth Qian, David Peterson · Mar 23, 2026

What it does

Why it matters

This paper proposes MAGPI, a Gaussian process regression method that augments the high-fidelity input space with features derived from recursively-trained low-fidelity surrogate models. The approach unifies desirable properties from...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Multifidelity surrogate modeling aims to leverage cheap low-fidelity simulations to improve predictions of expensive high-fidelity models when training data is scarce. This paper proposes MAGPI, a Gaussian process regression method that augments the high-fidelity input space with features derived from recursively-trained low-fidelity surrogate models. The approach unifies desirable properties from cokriging and autoregressive estimators while allowing non-GP models for low-fidelity levels, achieving superior accuracy and computational efficiency.

Critical review

Verdict

Bottom line

The paper presents a well-motivated and technically sound approach to multifidelity Gaussian process regression. The core innovation—augmenting the high-fidelity input space with predictions from lower-fidelity surrogate models—effectively addresses the Markovian limitation of autoregressive methods while avoiding the cubic computational cost of cokriging. The theoretical guarantee of Proposition 1 provides a solid foundation, though it assumes optimal hyperparameter selection. However, the practical utility depends heavily on the quality of low-fidelity features, and the paper does not thoroughly investigate cases where low-fidelity models might be systematically biased or misleading.

“The proposed method trains $K$ surrogate models to emulate each level of fidelity, starting from level $K$ (lowest-fidelity) and ending at level 1 (highest-fidelity). At each subsequent level $l$, the outputs from trained surrogate models $h_{l+1}$ through $h_K$ are used as features to provide additional information about the true model $y_l$.”

Section 3.1 · Section 3.1

“adding more features as inputs guarantees the existence of hyperparameters which achieve at least as high a marginal likelihood as any trained GP without the additional features”

Section 3.1 · Proposition 1

What holds up

The method's flexibility to use arbitrary regression models for low-fidelity data (not just GPs) is a significant practical advantage, demonstrated convincingly in the CFD example where K-Nearest Neighbors replace expensive GP training on tens of thousands of points. The sequential training procedure reduces the prohibitive $\mathcal{O}([\sum_{l=1}^K N_l]^3)$ cost of cokriging to $\mathcal{O}(N_1^3 + \sum_{l=2}^K \tau_{\text{train}}^{(l)})$, making it scalable to many fidelity levels with abundant low-fidelity data. The extrapolation experiment on laminar flame speed demonstrates the method's ability to generalize outside the high-fidelity training domain by leveraging the informative mean function structure that combines inputs and low-fidelity predictions.

“Except for the highest level of fidelity, these surrogate models may be any regression method that provides a point estimate for the unknown function (e.g., linear regression, Deep Neural Networks, K-Nearest Neighbors, Random Forests, etc.).”

Section 3.1 · Section 3.1

“Because high-fidelity data are scarce (perhaps fewer than ten data points), the $\mathcal{O}(N_{1}^{3})$ training cost is not prohibitively expensive.”

Section 3.2 · Section 3.2

“At the furthest extrapolation, 850K, the proposed method achieves an RMSE 13x less than the next-best surrogate model (Kennedy O'Hagan).”

Section 4.3 · Section 4.3

Main concerns

While Proposition 1 guarantees the existence of hyperparameters achieving at least the marginal likelihood of the baseline, it does not guarantee that optimization will find them, nor does it account for the increased risk of overfitting when using flexible mean functions or many low-fidelity features. The method's reliance on the specific ordering of fidelities—despite claims that $y_2, \dots, y_K$ may be arbitrarily ordered—could propagate errors if lower-fidelity models are systematically biased or poorly calibrated, as the features are constructed recursively. Additionally, the paper notes "undesired oscillations in its predictive posterior" in the conclusion, suggesting potential instability in the kernel specification that is not fully resolved and could undermine reliability in safety-critical applications.

The theoretical analysis also assumes that the low-fidelity surrogate models provide useful information, but does not characterize how the method behaves when low-fidelity models are negatively correlated or orthogonal to the high-fidelity target, potentially adding noise rather than signal to the augmented inputs.

“In the worst case, no mean or kernel function in $\mathcal{M}_2$ and $\mathcal{K}_2$ is found that achieves a higher marginal likelihood than the mean functions and kernels in $\mathcal{M}_1$ and $\mathcal{K}_1$.”

Section 3.1 · Proposition 1 proof

“However, we emphasize that GP regression is only accurate if the true model belongs to the hypothesis class specified by the kernel and mean functions. A 99% confidence interval of the proposed method contains the true flow field 92.4% of the time; this is evidence of slight model misspecification which could be remedied by a different kernel and/or mean function formulations.”

Section 4.4 · Section 4.4

“the proposed method produces undesired oscillations in its predictive posterior; further investigation into mitigating such artifacts may result in more accurate estimators.”

Section 5 · Section 5

Evidence and comparison

The empirical evaluation covers diverse scenarios including a synthetic 1D problem with nonlinear relationships (where high-fidelity is a product of medium and low-fidelity functions), a chemical kinetics extrapolation task requiring generalization outside the training temperature range, and a sparse interpolation CFD problem with substantial low-fidelity data (up to 58,000 points). The comparisons against Kennedy O'Hagan, NARGP, and single-fidelity kriging consistently favor MAGPI across RMSE, $R^2$, and log marginal likelihood metrics. However, the use of KNN approximations for the autoregressive baselines in the CFD example—necessitated by the computational infeasibility of training full GPs on large low-fidelity datasets—represents a deviation from canonical implementations, though it fairly illustrates the practical constraints that MAGPI is designed to overcome. The comparison would be strengthened by including modern deep multifidelity methods and sparse GP approximations as baselines.

“The proposed method produced an RMSE value over 50% lower than the next lowest value (NARGP), an $R^2$ value approximately 10% higher than the next-highest value (NARGP), and a log ML roughly an order of magnitude higher than the next-highest value (NARGP).”

Section 4.2 · Section 4.2

“To enable comparison with these frameworks, we therefore used KNN approximations in place of trained low-fidelity GPs for both the Kennedy O'Hagan and NARGP approaches as well as in our proposed approach.”

Section 4.4 · Section 4.4

“This is an instance where a distinct accuracy hierarchy is not obvious without further investigation and an autoregressive method may propagate error in its predictions due to suboptimal model ordering.”

Section 4.4 · Section 4.4

Reproducibility

The paper provides detailed pseudocode (Algorithms 1 and 2) and specifies hyperparameter optimization via ADAM gradient descent with ARD kernels, enabling algorithmic reproduction. However, no code repository, software versions, or specific random seeds are provided in the text, which would impede independent reproduction. The high-fidelity CFD data consists of only 45 training points selected from a specific spatial region, raising questions about sensitivity to training set selection and spatial distribution. The reliance on specialized chemical kinetics and CFD solvers (USC-II mechanism, LES/RANS simulations) without public datasets or standardized benchmarks further limits reproducibility, though this is typical for the application domain. The complexity analysis is thorough, but actual wall-clock timing comparisons are absent, making it difficult to assess the practical computational savings claimed.

“For pseudocode of the proposed method, refer to Algorithm 1 for offline training and Algorithm 2 for online prediction.”

Section 3.1 · Algorithms 1 and 2

“For maximum fairness across methods, each GP had its hyperparameters (mean/kernel parameters and white noise variance) tuned using gradient-descent with the ADAM algorithm, and each was allowed to iterate until convergence (over 1,000 iterations without an improvement in marginal likelihood).”

Section 4.2 · Section 4.2

“For high-fidelity training data, we selected a grid of points 5mm apart in the recirculation region (X coordinates $\leq 0.04$m) of the high-fidelity flow field.”

Section 4.4 · Section 4.4

Abstract

Supervised machine learning describes the practice of fitting a parameterized model to labeled input-output data. Supervised machine learning methods have demonstrated promise in learning efficient surrogate models that can (partially) replace expensive high-fidelity models, making many-query analyses, such as optimization, uncertainty quantification, and inference, tractable. However, when training data must be obtained through the evaluation of an expensive model or experiment, the amount of training data that can be obtained is often limited, which can make learned surrogate models unreliable. However, in many engineering and scientific settings, cheaper \emph{low-fidelity} models may be available, for example arising from simplified physics modeling or coarse grids. These models may be used to generate additional low-fidelity training data. The goal of \emph{multifidelity} machine learning is to use both high- and low-fidelity training data to learn a surrogate model which is cheaper to evaluate than the high-fidelity model, but more accurate than any available low-fidelity model. This work proposes a new multifidelity training approach for Gaussian process regression which uses low-fidelity data to define additional features that augment the input space of the learned model. The approach unites desirable properties from two separate classes of existing multifidelity GPR approaches, cokriging and autoregressive estimators. Numerical experiments on several test problems demonstrate both increased predictive accuracy and reduced computational cost relative to the state of the art.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.