BOxCrete: A Bayesian Optimization Open-Source AI Model for Concrete Strength Forecasting and Mix Optimization

cs.LG cs.AI Bayezid Baten, M. Ayyan Iqbal, Sebastian Ament, Julius Kusuma, Nishant Garg · Mar 23, 2026
Local to this browser
What it does
Concrete mix design requires balancing competing objectives of mechanical strength and sustainability. BOxCrete introduces a Gaussian Process regression framework trained on 533 strength measurements from 123 unique mixtures to predict...
Why it matters
BOxCrete introduces a Gaussian Process regression framework trained on 533 strength measurements from 123 unique mixtures to predict compressive strength evolution over curing time and optimize mixes for embodied carbon using...
Main concern
The paper presents a technically sound GP-based modeling approach with proper uncertainty quantification, but its performance claims rely on very limited validation data (12 concrete mixes). While the experimental design is systematic and...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Concrete mix design requires balancing competing objectives of mechanical strength and sustainability. BOxCrete introduces a Gaussian Process regression framework trained on 533 strength measurements from 123 unique mixtures to predict compressive strength evolution over curing time and optimize mixes for embodied carbon using multi-objective Bayesian Optimization. The work addresses a critical gap in the literature by providing an open-source alternative to proprietary industrial datasets and models.

Critical review
Verdict
Bottom line

The paper presents a technically sound GP-based modeling approach with proper uncertainty quantification, but its performance claims rely on very limited validation data (12 concrete mixes). While the experimental design is systematic and the open-source intent is commendable, the small dataset size (123 unique formulations) and lack of generalization testing across diverse material sources limit the broad applicability of the model.

“validated against 12 independent concrete mixes excluded from training”
paper · Results section
What holds up

The phase-wise experimental design (Phases I-VI) demonstrates systematic expansion of the compositional design space, and the GP framework successfully captures time-dependent hydration behavior with quantitative uncertainty bounds. The use of a composite kernel combining time-dependent and joint composition-time components is physically motivated, and the multi-objective optimization coupling strength with Global Warming Potential addresses a practical industry need.

“The phase-wise learning strategy enables efficient characterization of nonlinear hydration behavior”
paper · Experimental Methods
“The model successfully reproduces the characteristic sigmoidal strength-evolution behavior exhibited across all mixtures”
paper · Section 2
Main concerns

The model validation relies on only 12 concrete mixes (Testing Set 1), with Table 1 showing high variance in $R^2$ (0.67–0.94) across random testing sets, particularly for early-age predictions (1-day), suggesting instability given the small sample size. The conflation of mortar (69 mixes) and concrete (54 mixes) in training without separate validation on each material type raises generalization concerns, as aggregates fundamentally alter hydration kinetics. The claim of achieving '$R^2 = 0.94$' obscures lower performance at early ages (1-day $R^2 = 0.80 \pm 0.12$) and conflates mortar and concrete performance metrics.

“R² = 0.94 ± 0.01 and RMSE = 0.69 ± 0.07 ksi”
paper · Table 1
“1-Day ... 0.80 ± 0.12”
paper · Table 1
Evidence and comparison

Comparisons to Pfeiffer et al. (2024) and Young et al. (2019) are methodologically fair, though those studies used noisy field/industrial data whereas this work uses controlled laboratory measurements. The claim that BOxCrete achieves 'similar or better accuracy with nearly two orders of magnitude fewer data points' (123 vs ~9,296 mixtures) is arithmetically accurate but misleading regarding coverage—123 lab mixes cover far less of the global material variability than thousands of field records. The achievement of $R^2 = 0.94$ exceeds cited benchmarks, but the controlled setting and small test set limit comparability to real-world variability.

“Pfeiffer et al. (2024)... reported R² ≈ 0.88 with RMSE ≈ 0.91 ksi”
paper · Section 2
“BOxCrete achieves similar or better accuracy with nearly two orders of magnitude fewer data points”
paper · Section 2
Reproducibility

While the paper states the code and dataset are released under the MIT license as 'open-access resources,' the provided text contains no URLs, DOIs, or repository identifiers, making immediate reproduction impossible. Hyperparameters are well-documented (Matérn 5/2 kernel with ARD, qLogEHVI acquisition function), and the experimental mix proportions are detailed in Table 2. However, without access to the actual code repository and raw strength measurement data, independent verification of the claimed $R^2 = 0.94$ performance is currently blocked.

“providing an open-source repository under the MIT license”
paper · Limitations section
“Matérn 5/2 kernel with Automatic Relevance Determination (ARD)”
paper · Experimental Methods
Abstract

Modern concrete must simultaneously satisfy evolving demands for mechanical performance, workability, durability, and sustainability, making mix designs increasingly complex. Recent studies leveraging Artificial Intelligence (AI) and Machine Learning (ML) models show promise for predicting compressive strength and guiding mix optimization, but most existing efforts are based on proprietary industrial datasets and closed-source implementations. Here we introduce BOxCrete, an open-source probabilistic modeling and optimization framework trained on a new open-access dataset of over 500 strength measurements (1-15 ksi) from 123 mixtures - 69 mortar and 54 concrete mixes tested at five curing ages (1, 3, 5, 14, and 28 days). BOxCrete leverages Gaussian Process (GP) regression to predict strength development, achieving average R$^2$ = 0.94 and RMSE = 0.69 ksi, quantify uncertainty, and carry out multi-objective optimization of compressive strength and embodied carbon. The dataset and model establish a reproducible open-source foundation for data-driven development of AI-based optimized mix designs.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.