Benchmarking Scientific Machine Learning Models for Air Quality Data

cs.LG Khawja Imran Masud, Venkata Sai Rahul Unnam, Sahara Ali · Mar 22, 2026
Local to this browser
What it does
This paper benchmarks classical statistical models (LR, SARIMAX), deep learning approaches (MLP, LSTM), and physics-guided variants for multi-horizon AQI forecasting in Dallas County, North Texas. The core innovation is incorporating EPA...
Why it matters
The core innovation is incorporating EPA breakpoint-based AQI formulations as consistency constraints via weighted loss functions ($\mathcal{L}_{total} = \lambda_{data}\mathcal{L}_{data} + \lambda_{phys}\mathcal{L}_{phys}$). The work...
Main concern
The paper delivers a useful regional benchmarking dataset and confirms that deep learning models outperform classical baselines for AQI forecasting. However, the claimed benefits of 'physics guidance' are marginal (often <0.
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

This paper benchmarks classical statistical models (LR, SARIMAX), deep learning approaches (MLP, LSTM), and physics-guided variants for multi-horizon AQI forecasting in Dallas County, North Texas. The core innovation is incorporating EPA breakpoint-based AQI formulations as consistency constraints via weighted loss functions ($\mathcal{L}_{total} = \lambda_{data}\mathcal{L}_{data} + \lambda_{phys}\mathcal{L}_{phys}$). The work addresses a practical need for standardized regional model comparison to guide public health decision-making.

Critical review
Verdict
Bottom line

The paper delivers a useful regional benchmarking dataset and confirms that deep learning models outperform classical baselines for AQI forecasting. However, the claimed benefits of 'physics guidance' are marginal (often <0.5% MAE improvement) and lack statistical significance testing. The 'physics-informed' framing is misleading—the constraint uses the EPA's piecewise linear AQI lookup formula (Eq. 5: $f_{AQI}(C_i) = I_{low}^{(k)} + \frac{I_{high}^{(k)} - I_{low}^{(k)}}{C_{high}^{(k)} - C_{low}^{(k)}}(C_i - C_{low}^{(k)})$) rather than atmospheric physics or PDEs as in true PINNs (Raissi et al.). The study finds physics constraints help PM2.5 more than O3, but cannot explain this beyond noting ozone's photochemical complexity.

“f_{AQI}(C_{i})=I_{\text{low}}^{(k)}+\frac{I_{\text{high}}^{(k)}-I_{\text{low}}^{(k)}}{C_{\text{high}}^{(k)}-C_{\text{low}}^{(k)}}\left(C_{i}-C_{\text{low}}^{(k)}\right)”
paper · Section 4.3, Eq. 5
“physics guidance improves stability and yields physically consistent pollutant with AQI relationships”
paper · Abstract
What holds up

The systematic evaluation across four forecasting horizons (LAG $\in$ {1,7,14,30} days) and two pollutants (PM2.5 and O3) provides practical value for model selection in the Dallas region. The data curation from EPA stations is well-documented (Algorithm 1), and the chronological 80-20 split avoids data leakage. The empirical finding that deep learning models (MLP/LSTM) consistently outperform linear regression and SARIMAX (Tables 5-6) is sound and aligns with the literature. The observation that O3 exhibits higher volatility than PM2.5 (Figure 4) and is harder to predict at longer horizons is consistent with atmospheric chemistry expectations.

“With LAG=1, the targets are still rather near to the current AQI... As the AQI forecasting horizons become closer to LAG=7, 14, and 30, the future AQI becomes less dependent on the present and more variable”
paper · Section 6.1
“Rows in which the future AQI values are not valid (including the last L days of the dataset) were dropped in order to maintain dataset completeness”
paper · Section 3.3
Main concerns

First, the magnitude of improvement from physics guidance is questionable. For PM2.5 LAG=1, MLP achieves MAE 8.2149 while MLP+Physics achieves 8.1695—a 0.55% improvement (Table 7). Without confidence intervals or statistical testing, these differences may be noise. Second, the LSTM architecture uses sequence length T=1 (Section 4.2.2), reducing it to essentially a complex feedforward network without temporal sequence processing. Third, the authors criticize Chandrashekar et al. [2] for lacking 'explicit physical equations' while their own 'physics' is merely an algebraic AQI conversion formula, not atmospheric governing equations. Finally, the physics guidance shows essentially no benefit for O3 (Table 8), yet the abstract obscures this limitation by highlighting aggregate improvements.

“the sequence length is settled to T=1 along with two features, F=2”
paper · Section 4.2.2
“MLP (1.0, 0.0): MAE 8.2149... MLP+Physics (0.0, 1.0): MAE 8.1695”
paper · Table 7
“Yet the lack of explicit physical equations... raises the question of how directly 'physics-informed' the model really is”
paper · Section 2
Evidence and comparison

The related work comparison (Table 1) is generally fair in noting that prior studies use different datasets and metrics, making direct comparison difficult. However, the paper conflates its simple algebraic constraint with the PDE-based physics constraints used by Shi et al. [24] ('Phy-APMR'), which actually implements pollutant transport equations. The authors' claim that physics guidance 'improves stability' is supported only by visual inspection of time-series plots (Figures 5-6), not quantitative stability metrics. The comparison between PM2.5 and O3 results adequately acknowledges that ozone's photochemical formation cannot be captured by simple breakpoint formulas, though this undermines the paper's broader claims about physics-guided improvement.

“Comparison of major ML, DL, hybrid, and physics-informed approaches”
paper · Table 1
“A simple breakpoint-based AQI calculation is not enough to consider the complicated photochemical reactions, weather, and precursor emissions required for ozone production”
paper · Section 7
Reproducibility

Reproducibility is limited by the absence of a code repository or dataset release link. While Algorithm 1 formally describes the dataset construction, the implementation relies on specific EPA portal downloads that are not archived. Hyperparameters are specified (Table 4), but the 'fixed random seeds' mentioned in Section 5.1 are not actually stated, and the number of training epochs appears arbitrary ('fixed number of epochs' per Algorithm 2). The hardware specification (Intel Core i5, 8GB RAM) is minimal—training wall-clock times are not reported, making computational cost comparisons to related work impossible. The paper mentions using AdamW with learning rate 0.001, but does not describe learning rate schedules or early stopping criteria precisely enough to replicate.

“fixed random seeds were applied across all experiments”
paper · Section 5.1
“for epoch =1 to max_epochs do... Backpropagate \mathcal{L} and update model parameters”
paper · Algorithm 2
“Architecture / Order... 64-32... Learning rate 0.001”
paper · Section 5.2/Table 4
Abstract

Accurate air quality index (AQI) forecasting is essential for the protecting public health in rapidly growing urban regions, and the practical model evaluation and selection are often challenged by the lack of rigorous, region-specific benchmarking on standardized datasets. Physics-guided machine learning and deep learning models could be a good and effective solution to resolve such issues with more accurate and efficient AQI forecasting. This research study presents an explainable and comprehensive benchmark that enables a guideline and proposed physics-guided best model by benchmarking classical time-series, machine-learning, and deep-learning approaches for multi-horizon AQI forecasting in North Texas (Dallas County). Using publicly available U.S. Environmental Protection Agency (EPA) daily observations of air quality data from 2022 to 2024, we curate city-level time series for PM2.5 and O3 by aggregating station measurements and constructing lag-wise forecasting datasets for LAG in {1,7,14,30} days. For benchmarking the best model, linear regression (LR), SARIMAX, multilayer perceptrons (MLP), and LSTM networks are evaluated with the proposed physics-guided variants (MLP+Physics and LSTM+Physics) that incorporate the EPA breakpoint-based AQI formulation as a consistency constraint through a weighted loss. Experiments using chronological train-test splits and error metrics MAE, RMSE showed that deep-learning models outperform simpler baselines, while physics guidance improves stability and yields physically consistent pollutant with AQI relationships, with the largest benefits observed for short-horizon prediction and for PM2.5 and O3. Overall, the results provide a practical reference for selecting AQI forecasting models in North Texas and clarify when lightweight physics constraints meaningfully improve predictive performance across pollutants and forecast horizons.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.