Amortized Variational Inference for Logistic Regression with Missing Covariates

cs.LG eess.SP M. Cherifi, Aude Sportisse, Xujia Zhu, Mohammed Nabil El Korso, A. Mesloub · Mar 22, 2026

What it does

Why it matters

Unlike VAE-based competitors, it directly models the posterior over missing values using a single neural network coupled with a linear classification layer, enabling joint optimization of imputation and prediction. The approach extends...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

The paper proposes AV-LR, a lightweight amortized variational inference framework for logistic regression with missing covariates that eliminates latent variables entirely. Unlike VAE-based competitors, it directly models the posterior over missing values using a single neural network coupled with a linear classification layer, enabling joint optimization of imputation and prediction. The approach extends naturally to MNAR settings and claims substantial computational speedups over EM-based methods while maintaining comparable statistical accuracy.

Critical review

Verdict

Bottom line

The paper presents a well-motivated and technically sound approach that successfully bridges the gap between computationally expensive EM algorithms and over-parameterized VAE-based methods. The core idea—amortizing inference directly over missing covariates without introducing auxiliary latent variables—is both elegant and pragmatic. The empirical evaluation demonstrates that AV-LR achieves competitive or superior performance compared to SAEM and DLGLM across MCAR, MAR, and MNAR settings, with training times orders of magnitude faster than SAEM (e.g., ~35 seconds vs ~3900 seconds under MCAR, Table 2). However, the paper occasionally oversimplifies the limitations of VAE-based alternatives and could benefit from deeper theoretical analysis of the variational gap introduced by the mean-field Gaussian assumption for the inference network.

“AV-LR required a moderate training time... while SAEM exhibited a substantially higher training cost (on the order of $10^3$ seconds)”

Table 2 · Section 7.2.1

What holds up

The architectural simplicity of AV-LR represents a genuine contribution—the elimination of latent variables $\bm{z}$ reduces model complexity while maintaining expressiveness through the direct amortized posterior $q_{\bm{\phi}}(\bm{x}_{\text{mis}} \mid \bm{x}_{\text{obs}}, y)$. The extension to MNAR through explicit modeling of $p_{\bm{\psi}}(\bm{r}_i \mid \bm{x}_i, y_i)$ is well-executed and empirically validated, showing consistent improvements under non-ignorable mechanisms: "the non-ignorable extension of AV-LR exhibits consistent and substantial performance improvements, observed in AUC, accuracy, and the Brier score" (Section 7.1). The experimental protocol is comprehensive, covering synthetic and real-world datasets with multiple missingness mechanisms including Self-Masking, Logistic, and Sequential Logistic.

“the non-ignorable extension of AV-LR exhibits consistent and substantial performance improvements, observed in AUC, accuracy, and the Brier score, relative to the ignorable variant”

Section 7.1 · Classification performance

Main concerns

The paper assumes multivariate Gaussian covariates throughout ($\bm{x}_i \sim \mathcal{N}(\bm{\mu}, \bm{\Sigma})$), which limits applicability to non-Gaussian or mixed-type data—a constraint acknowledged only briefly in the conclusion. The importance-weighted ELBO objective relies on sampling $K$ importance weights without theoretical guidance on scaling with dimensionality. A more significant issue is that the prediction procedure under MNAR (Section 5) requires sampling from $q_{\bm{\phi}}(\bm{x}_{i,\text{miss}} \mid \bm{x}_{i,\text{obs}}, y_i=c, \bm{r}_i)$ separately for both classes $c \in \{0,1\}$, effectively doubling inference cost at test time relative to standard single-forward-pass classifiers—a complexity not fully discussed.

“we assume the complete covariate vector $\bm{x}_i$ follows a multivariate normal distribution $\bm{x}_i \sim \mathcal{N}(\bm{\mu}, \bm{\Sigma})$”

Section 2 · Model Setup

“we approximate these integrals via importance sampling using the variational distribution $q_{\phi}$ as the proposal distribution with $S$ draws per class”

Section 5 · Prediction with Missing Covariates

Evidence and comparison

The empirical evidence broadly supports the claims, though some comparisons require qualification. AV-LR consistently outperforms SAEM in RMSE for $\bm{\beta}$ estimation under MNAR (Table 3: 0.54 vs 0.66) while being ~67x faster. However, the comparison with DLGLM is nuanced: DLGLM uses a neural network for the regression function $\eta(\bm{x}) = s_{\beta,\pi}(\bm{x})$, whereas AV-LR restricts itself to linear logistic regression, making the comparison slightly unfair in terms of model capacity. The real-world experiments (Section 8) demonstrate robustness across BankNote, Pima, Rice, and Breast Cancer datasets under three MNAR mechanisms, with AV-LR achieving the highest average AUC scores (Table 5: e.g., 0.973 vs 0.954 for Rice/Logistic). Notably, SAEM is excluded from real-world MNAR comparisons due to computational constraints, which weakens claims of universal superiority in that regime.

“AV-LR: RMSE_beta 0.5400 ± 0.2100, DLGLM: RMSE_beta 0.5600 ± 0.2200, SAEM: RMSE_beta 0.6600 ± 0.2900”

Table 3 · Section 7.2.2

“DLGLM employs a deep neural network to model the relationship between covariates and response, i.e., $p(y|\bm{x})=\text{GLM}(y;\eta(\bm{x}))$ where $\eta(\bm{x})=s_{\beta,\pi}(\bm{x})$ is a neural network”

Section 6 · Comparison with VAE-based methods

Reproducibility

The paper provides substantial methodological detail including exact architectures (single hidden layer, 128 units), optimization hyperparameters (150 epochs, batch size 256, learning rate $10^{-3}$), and data generation protocols. However, critical gaps remain: no code repository is mentioned or linked, the exact initializations for neural network weights are unspecified, and random seeds are not provided. For the neural methods, the number of importance samples $K$ used in the IWELBO objective ($\mathcal{L}_{\mathrm{IW}}$ in Equation 6) is not explicitly stated in the main text, and the reparameterization trick for the Cholesky factorization $\bm{L}_{q,i}$ requires careful implementation details not fully elaborated. The SAEM implementation details (convergence threshold $10^{-4}$, maximum 120 iterations) are provided, but the specific MCMC sampling schemes for the S-step are not detailed, making independent reproduction challenging for that baseline.

“a single hidden layer with 128 units. These models are trained for 150 epochs with a batch size of 256 and a learning rate of $10^{-3}$”

Section 7.2 · Hyperparameters tuning

“For each observation $i$, we draw $K$ independent samples from the variational posterior”

Section 3.1 · Variational Lower Bound and Importance-Weighted Estimation

Abstract

Missing covariate data pose a significant challenge to statistical inference and machine learning, particularly for classification tasks like logistic regression. Classical iterative approaches (EM, multiple imputation) are often computationally intensive, sensitive to high missingness rates, and limited in uncertainty propagation. Recent deep generative models based on VAEs show promise but rely on complex latent representations. We propose Amortized Variational Inference for Logistic Regression (AV-LR), a unified end-to-end framework for binary logistic regression with missing covariates. AV-LR integrates a probabilistic generative model with a simple amortized inference network, trained jointly by maximizing the evidence lower bound. Unlike competing methods, AV-LR performs inference directly in the space of missing data without additional latent variables, using a single inference network and a linear layer that jointly estimate regression parameters and the missingness mechanism. AV-LR achieves estimation accuracy comparable to or better than state-of-the-art EM-like algorithms, with significantly lower computational cost. It naturally extends to missing-not-at-random settings by explicitly modeling the missingness mechanism. Empirical results on synthetic and real-world datasets confirm its effectiveness and efficiency across various missing-data scenarios.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.