KHMP: Frequency-Domain Kalman Refinement for High-Fidelity Human Motion Prediction

cs.CV Wenhan Wu, Zhishuai Guo, Chen Chen, Srijan Das, Hongfei Xue, Pu Wang, Aidong Lu · Mar 22, 2026
Local to this browser
What it does
Stochastic human motion prediction often suffers from high-frequency jitter and physically implausible poses. This paper proposes KHMP, a framework that combines training-time physical constraints (temporal smoothness and joint angle...
Why it matters
This paper proposes KHMP, a framework that combines training-time physical constraints (temporal smoothness and joint angle limits) with a novel inference-time refinement: an adaptive Kalman filter operating in the DCT frequency domain....
Main concern
KHMP presents a compelling dual-pronged solution that genuinely advances motion prediction fidelity. The frequency-domain Kalman filtering is a novel application for this task, and the SNR-adaptive parameterization ($Q$ and $R$) provides a...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Stochastic human motion prediction often suffers from high-frequency jitter and physically implausible poses. This paper proposes KHMP, a framework that combines training-time physical constraints (temporal smoothness and joint angle limits) with a novel inference-time refinement: an adaptive Kalman filter operating in the DCT frequency domain. The key innovation treats high-frequency DCT coefficients as a frequency-indexed noisy signal, recursively filtering them with parameters dynamically adjusted based on estimated Signal-to-Noise Ratio (SNR).

Critical review
Verdict
Bottom line

KHMP presents a compelling dual-pronged solution that genuinely advances motion prediction fidelity. The frequency-domain Kalman filtering is a novel application for this task, and the SNR-adaptive parameterization ($Q$ and $R$) provides a principled mechanism to avoid over-smoothing clean predictions while aggressively denoising jittery ones. The physical constraints during training—particularly the cosine-based joint angle loss that avoids unstable arccos gradients—are well-designed. However, the method builds heavily on the SLD-HMP backbone, and the theoretical justification for assuming adjacent DCT coefficients follow a first-order Gaussian-Markov process remains somewhat heuristic rather than formally derived from motion dynamics.

“By entirely avoiding the unstable arccos operation, our formulation prevents angles from becoming too small... strictly enforcing biomechanical plausibility while ensuring robust explosion-free differentiability during training.”
KHMP paper · Section 4.2
“On HumanEva-I, the refinement increases inference time by approximately 1.2×, while on Human3.6M, the overhead is approximately 1.6× due to the larger number of joints.”
KHMP paper · Supplementary Material J
What holds up

The adaptive Kalman refinement mechanism is rigorously derived and empirically validated. The steady-state error analysis (SupMat A) showing $P^* < R$ confirms the filter reduces MSE, and the ablation in Table 2(c) demonstrates that adaptive parameterization outperforms fixed Kalman filtering (ADE 0.188 vs 0.194). The 28% average jitter reduction (Table 2e) with simultaneous accuracy improvements suggests the method genuinely removes noise rather than over-smoothing. The comparison against fixed frequency suppression (Table 2d) convincingly shows the value of sample-specific adaptation across all tested thresholds $\gamma \in [0.1, 0.9]$.

“Adaptive (Ours): ADE 0.188, FDE 0.204, APD 7.481 vs Fixed Kalman: ADE 0.194, FDE 0.209, APD 6.208”
KHMP paper · Table 2(c)
“This formulation enables adaptive smoothing with a clear physical interpretation: the filter intelligently applies stronger smoothing to signals estimated as noisy (low SNR) while faithfully tracking signals estimated as clean (high SNR).”
KHMP paper · Section 4.3
Main concerns

First, the method's reliance on the SLD-HMP backbone raises questions about whether improvements stem from the novel refinement or simply better hyperparameter tuning—the baseline* in Table 1 uses KHMP's training setup without the physical constraints, yet already outperforms original SLD-HMP. Second, while the paper claims to resolve the accuracy-diversity trade-off, the diversity improvements (APD increases from 6.516 to 7.481 on HumanEva-I) are modest compared to accuracy gains. Third, the assumption that adjacent DCT coefficients form a first-order Markov process ($x_k = x_{k-1} + w_k$) lacks spectral-theoretic justification—frequency components are not inherently temporally or sequentially related in the way time-series samples are. Fourth, the 1.2×–1.6× inference overhead limits real-time applications.

“Baseline*: APD 6.516, ADE 0.196 vs KHMP: APD 7.481, ADE 0.188”
KHMP paper · Table 1
“By modeling a first-order Gaussian-Markov relationship between adjacent frequency components, the filter recursively smooths the high-frequency spectrum...”
KHMP paper · Section 4.3
Evidence and comparison

The evidence supports the core claims reasonably well. The quantitative comparison against 16 prior methods (Table 1) shows KHMP achieves state-of-the-art ADE/FDE on HumanEva-I and competitive MMADE/MM FDE on Human3.6M. The ablation studies properly isolate components: Table 2(a) shows both physical constraints and Kalman refinement contribute synergistically, while Table 2(d) validates the adaptive approach against static frequency suppression. However, the comparison to MotionWavelet (SupMat F) is brief—while KHMP outperforms it, MotionWavelet uses wavelets rather than DCT, and the analysis could deeper explore when wavelet vs DCT bases are preferable. The visualizations (Figure 4) effectively demonstrate jitter reduction in trajectory space.

“KHMP (Full): ADE 0.188, FDE 0.204 vs w/o Phys. Loss: ADE 0.193, FDE 0.208 vs w/o Kalman: ADE 0.194, FDE 0.206”
KHMP paper · Table 2(a)
“Across all examples, the baseline prediction (red line) exhibits noticeable high-frequency jitter, whereas KHMP (blue line) significantly smooths the trajectory...”
KHMP paper · Figure 4
Reproducibility

The paper provides sufficient detail for reproduction. The architecture follows SLD-HMP with modifications clearly described. All hyperparameters are listed (SupMat E): $k_0=10$, $Q_0=1\times 10^{-6}$, $R_0=1\times 10^{-2}$, $\lambda_Q=0.2$, $\lambda_R=0.5$, with loss weights specified in Table 3. The code is publicly available at the provided GitHub repository. The DCT-Kalman refinement operates independently per joint-coordinate channel, making the algorithm straightforward to implement (Algorithm 1 in SupMat I). However, the project page link in the abstract appears malformed (ends with .git), and reproduction requires access to the full SLD-HMP implementation details for the baseline components.

“Our Frequency-Kalman refinement module is a post-processing step applied only during inference, with its hyperparameters set to $k_0=10, Q_0=1e-6, R_0=1e-2, \lambda_Q=0.2, \lambda_R=0.5$.”
KHMP paper · Supplementary Material E
“Our project is publicly available at: https://github.com/wenhanwu95/KHMP-Project-Page.git”
KHMP paper · Abstract
Abstract

Stochastic human motion prediction aims to generate diverse, plausible futures from observed sequences. Despite advances in generative modeling, existing methods often produce predictions corrupted by high-frequency jitter and temporal discontinuities. To address these challenges, we introduce KHMP, a novel framework featuring an adaptiveKalman filter applied in the DCT domain to generate high-fidelity human motion predictions. By treating high-frequency DCT coefficients as a frequency-indexed noisy signal, the Kalman filter recursively suppresses noise while preserving motion details. Notably, its noise parameters are dynamically adjusted based on estimated Signal-to-Noise Ratio (SNR), enabling aggressive denoising for jittery predictions and conservative filtering for clean motions. This refinement is complemented by training-time physical constraints (temporal smoothness and joint angle limits) that encode biomechanical principles into the generative model. Together, these innovations establish a new paradigm integrating adaptive signal processing with physics-informed learning. Experiments on the Human3.6M and HumanEva-I datasets demonstrate that KHMP achieves state-of-the-art accuracy, effectively mitigating jitter artifacts to produce smooth and physically plausible motions.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.