Sonny: Breaking the Compute Wall in Medium-Range Weather Forecasting

cs.LG cs.AI cs.CV physics.ao-ph Minjong Cheon · Mar 22, 2026

What it does

Sonny tackles the compute barrier in medium-range weather forecasting by proposing a hierarchical transformer that trains on a single A40 GPU in 5. 5 days.

Why it matters

The core idea is a two-stage StepsNet pipeline: a narrow 'slow path' processes large-scale dynamics (U,V,Z,P) first, then a full-width 'fast path' integrates thermodynamics (T,Q). Combined with EMA during training, randomized dynamics...

Main concern

Sonny successfully demonstrates that medium-range weather forecasting can be achieved with modest compute—training to convergence in 5. 5 days on a single A40.

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Sonny tackles the compute barrier in medium-range weather forecasting by proposing a hierarchical transformer that trains on a single A40 GPU in 5.5 days. The core idea is a two-stage StepsNet pipeline: a narrow 'slow path' processes large-scale dynamics (U,V,Z,P) first, then a full-width 'fast path' integrates thermodynamics (T,Q). Combined with EMA during training, randomized dynamics forecasting, and pressure-weighted losses, Sonny aims to deliver competitive forecast skill without the TPU/GPU cluster requirements of models like Pangu-Weather or GraphCast.

Critical review

Verdict

Bottom line

Sonny successfully demonstrates that medium-range weather forecasting can be achieved with modest compute—training to convergence in 5.5 days on a single A40. The model shows competitive ACC scores against HRES and outperforms FastNet in tropical regions at extended lead times (Fig. 4). The EMA ablation demonstrates consistent RMSE reductions averaging 3.34% across all variables. However, the paper's architectural contributions are largely borrowed: StepsNet comes from Han et al., while randomized dynamics forecasting and pressure-weighted losses come from Nguyen et al./Stormer [17]. The primary novelty lies in applying this specific combination to create a reproducible small-scale baseline.

“applying EMA yields an overall average error reduction of approximately 3.34% across the entire forecast period”

paper · Section 3.1

“Sonny significantly widened the performance gap with FastNet O96, maintaining high predictive skill”

paper · Section 3.2

What holds up

The compute efficiency claim is well-supported: 20.5M parameters, 96.81 GMACs, and 5.5 days on one A40 GPU (32GB peak memory) is plausible and reproducible for academic labs. The variable-aware embedding splitting dynamics from thermodynamics is physically motivated and matches atmospheric hierarchy. The EMA ablation is thorough across nine variables, showing consistent gains. Case studies on Typhoon Nanmadol (137.5 km track error at 120h) and Winter Storm Elliott demonstrate practical utility for high-impact events without the over-smoothing typical of low-resolution models.

“Sonny: 20.50M params, 96.81 MACs, 1×A40, 5.5 days”

paper · Table 3

“track error is about 137.5 km, comparable to reported results from leading AI weather models including Pangu-Weather and GraphCast”

paper · Section 3.3

Main concerns

First, architectural novelty is limited: the two-stage StepsNet design is credited to Han et al. [12], and both randomized dynamics forecasting ($\delta t \sim \mathcal{U}\{6,12,24\}$) and pressure-weighted loss are attributed to Nguyen et al. [17]—components central to Stormer. EMA as a training stabilizer is a standard technique, and the claim it replaces 'computationally expensive fine-tuning' lacks comparison data against actual fine-tuning. Second, the comparison table (Table 3) mixes incompatible resolutions (0.25°, 1.4°, and 1.5°), making hardware/time comparisons misleading. The paper compares Sonny against FastNet O96 without clarifying whether this denotes resolution or configuration, and omits direct comparison with larger AI models at the same 1.5° resolution (Pangu, GraphCast).

“$\delta t$ was sampled from a discrete uniform distribution $P(\delta t) \sim \mathcal{U}\{6,12,24\}$ hours [17]”

paper · Section 2.2

“We bypassed this computationally expensive fine-tuning phase by implementing an Exponential Moving Average (EMA)”

paper · Section 2.4

“721×1440 (0.25°)... 128×256 (~1.4°)... 121×240 (1.5°)”

paper · Table 3

Evidence and comparison

The evidence supports Sonny's competitive performance against HRES and FastNet, but comparisons are limited. Tropical advantages over FastNet O96 are documented (Fig. 4), though FastNet's O96 designation is undefined. The paper does not compare against Stormer or other efficient baselines at matching resolution and hardware constraints. The randomized dynamics approach enables multi-path inference for uncertainty estimation, but no probabilistic results are shown. Case studies are qualitative, and the claimed track error improvement lacks confidence intervals or statistical testing across multiple events.

“comparing Sonny, Met Office GM, and FastNet O96 over NHET, SHET, and the Tropics”

paper · Figure 4 caption

“multi-path inference from randomized intervals can be leveraged to estimate forecast uncertainty”

paper · Section 4

Reproducibility

Reproducibility is partially addressed but incomplete. Table 2 provides optimizer settings (AdamW, lr=5e-4, bs=16, 50 epochs) and Table 3 includes parameter counts and compute estimates. However, the paper does not explicitly state code availability or release a repository despite claiming to provide a 'practical training recipe aimed at reproducibility.' Data preprocessing details, checkpointing strategy, and exact EMA implementation specifics are omitted. The 5.5-day training claim assumes specific hardware (A40) but does not document variance or convergence criteria across random seeds. Full reproduction would require code release, exact training data splits, and hyperparameter configurations for the StepsNet stages.

“Full training process converged in approximately 5.5 days on a single NVIDIA A40 GPU, with peak GPU memory usage of about 32 GB”

paper · Section 2.5

“Backbone setting: ViT-S configuration, Number of parameters: 20.5M, Batch size: 16, Total epochs: 50”

paper · Table 2

Abstract

Weather forecasting is a fundamental problem for protecting lives and infrastructure from high-impact atmospheric events. Recently, data-driven weather forecasting methods based on deep learning have demonstrated strong performance, often reaching accuracy levels competitive with operational numerical systems. However, many existing models rely on large-scale training regimes and compute-intensive architectures, which raises the practical barrier for academic groups with limited compute resources. Here we introduce Sonny, an efficient hierarchical transformer that achieves competitive medium-range forecasting performance while remaining feasible within reasonable compute budgets. At the core of Sonny is a two-stage StepsNet design: a narrow slow path first models large-scale atmospheric dynamics, and a subsequent full-width fast path integrates thermodynamic interactions. To stabilize medium-range rollout without an additional fine-tuning stage, we apply exponential moving average (EMA) during training. On WeatherBench2, Sonny yields robust medium-range forecast skill, remains competitive with operational baselines, and demonstrates clear advantages over FastNet, particularly at extended tropical lead times. In practice, Sonny can be trained to convergence on a single NVIDIA A40 GPU in approximately 5.5 days.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.