Climate Prompting: Generating the Madden-Julian Oscillation using Video Diffusion and Low-Dimensional Conditioning
This paper proposes a conditional video diffusion model trained on ERA5 reanalysis to synthesize the Madden-Julian Oscillation (MJO)—the dominant mode of tropical intraseasonal variability. The core innovation is "climate prompting," where low-dimensional physical indices (MJO phase/amplitude via RMM-PCs, seasonal cycles, ENSO state) serve as conditioning tokens to generate physically consistent high-dimensional atmospheric fields. The work bridges the gap between interpretable low-order climate theory and high-resolution generative models, enabling controlled experiments like perpetual MJOs or isolated seasonal modulations for hypothesis testing.
The paper presents a compelling proof-of-concept for using conditional video diffusion as a physics-inspired generative tool for climate phenomena. The "climate prompting" paradigm—generating idealized MJO scenarios through controlled low-dimensional conditioning—is innovative and scientifically useful for deconstructing MJO dynamics. The model successfully captures key MJO characteristics including eastward propagation, quadrupole structure, and convectively coupled equatorial waves in wavenumber-frequency spectra. However, the work is somewhat limited by circular conditioning (using PCs derived from output fields as inputs), minimal quantitative validation beyond spectral analysis, and acknowledged biases in representing CC-Kelvin and MRG waves. The reproducibility is partially adequate though key details remain underspecified.
The fundamental concept of using low-dimensional climate indices as diffusion conditioning variables is sound and well-motivated. The wavenumber-frequency analysis (Fig. 2) credibly demonstrates that the model captures the essential spectral signature of the MJO ($k=1$-$3$, $\omega=0.01$-$0.03$ cpd) plus embedded convectively coupled Kelvin, Rossby, and MRG waves. The ensemble sampling analysis (Fig. 3) effectively illustrates the model's stochastic diversity—showing realisitic seasonal variations in ensemble spread and freely generated equatorial waves as sample deviations. The progressive experimental design (isolated MJO $\rightarrow$ seasonal modulation $\rightarrow$ ENSO modulation) logically demonstrates how conditioning variables interact to shape MJO characteristics.
The most critical issue is circularity in the conditioning: the RMM-UBC index (pc1, pc2) is derived directly from the model output fields (UBC and OLR) via projection onto empirical orthogonal functions, yet these same PCs serve as conditioning inputs. As acknowledged, "the principal components deduced from the sampled video closely match the conditionings" because they are computed from the generated fields themselves—this tautology limits scientific interpretability of how independent the generated structures truly are from the prompting. The validation relies almost exclusively on wavenumber-frequency spectra composites rather than rigorous statistical metrics (e.g., MJO skill scores, bivariate correlation, amplitude/error distributions). Training details are concerning: only 20,000 steps with condition dropout 0.1 appears minimal for such high-dimensional spatiotemporal data, yet no training curves or convergence diagnostics are provided. The claim that generation takes ~30 minutes for 60 years and is "on par with intermediate complexity models" lacks comparative citations or standardized benchmarking.
The spectral evidence (Figs. 2, 4, 6) supporting MJO and equatorial wave capture is visually convincing but lacks quantitative rigor. The comparison to ERA5 is qualitative rather than statistical—no RMSE, anomaly correlation, or MJO-specific metrics (RMM bivariate correlation, amplitude ratio) are reported. The paper positions itself against "traditional statistical methods" and "low-order models" but does not provide direct quantitative comparisons to these baselines. The cited related work in diffusion models for climate (Stock et al. 2024, Ren et al. 2025, Price et al. 2025) is appropriate, though the novelty relative to these approaches is primarily the specific focus on MJO low-dimensional conditioning rather than architectural advances. The claim that low-dimensional conditioning "decouples processes" is partially supported by the seasonal and ENSO modulation experiments (Figs. 6-7), though the interpretability benefit over simple compositing of observed data remains arguable.
Reproducibility is partially addressed but has significant gaps. The architecture (U-Net with transformers, spatial/temporal attention) and key hyperparameters (4 hierarchy levels, 8 attention heads, 250 DDIM steps, $\eta=1$) are documented in Table 3. Data sources (WeatherBench2 ERA5, NOAA ERSSTv5) and preprocessing (Butterworth high-pass filter at 120 days, climatology removal) are specified. However, critical reproducibility barriers include: no code repository URL provided (only "adapted from Bastek et al. 2023"), no random seed specification for the stochastic DDIM sampling with $\eta=1$, no training/validation loss curves to assess convergence, and undisclosed optimizer details (learning rate, schedule, batch size). The 20,000 training step count seems suspiciously low—at 64×16 spatial resolution across 4 fields and 16-frame sequences, this suggests either very small effective training data or potential undertraining not disclosed in validation metrics. The "Brick-Wall Denoising" method (stride 3) lacks pseudocode or sufficient detail for independent implementation.
Generative Deep Learning is a powerful tool for modeling of the Madden-Julian oscillation (MJO) in the tropics, yet its relationship to traditional theoretical frameworks remains poorly understood. Here we propose a video diffusion model, trained on atmospheric reanalysis, to synthetize long MJO sequences conditioned on key low-dimensional metrics. The generated MJOs capture key features including composites, power spectra and multiscale structures including convectively coupled waves, despite some bias. We then prompt the model to generate more tractable MJOs based on intentionally idealized low-dimensional conditionings, for example a perpetual MJO, an isolated modulation by seasons and/or the El Nino-Southern Oscillation, and so on. This enables deconstructing the underlying processes and identifying physical drivers. The present approach provides a practical framework for bridging the gap between low-dimensional MJO theory and high-resolution atmospheric complexity and will help tropical atmosphere prediction.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.