MISApp: Multi-Hop Intent-Aware Session Graph Learning for Next App Prediction

cs.LG Yunchi Yang, Longlong Li, Jianliang Wu, Cunquan Qu · Mar 23, 2026
Local to this browser
What it does
Next app prediction struggles when user intent shifts rapidly and historical profiles are sparse. MISApp tackles this via multi-hop session graphs that decompose transitions into 1-, 2-, and 3-hop structural ranges, using LightGCN for...
Why it matters
Next app prediction struggles when user intent shifts rapidly and historical profiles are sparse. MISApp tackles this via multi-hop session graphs that decompose transitions into 1-, 2-, and 3-hop structural ranges, using LightGCN for...
Main concern
MISApp presents a methodologically sound approach to session-based next-app prediction, combining multi-hop graph decomposition with cross-modal fusion to achieve strong empirical results on standard and cold-start splits. While the...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

Next app prediction struggles when user intent shifts rapidly and historical profiles are sparse. MISApp tackles this via multi-hop session graphs that decompose transitions into 1-, 2-, and 3-hop structural ranges, using LightGCN for lightweight propagation and a Transformer encoder-decoder to model intent evolution without requiring static user profiles, aiming for robust cold-start performance.

Critical review
Verdict
Bottom line

MISApp presents a methodologically sound approach to session-based next-app prediction, combining multi-hop graph decomposition with cross-modal fusion to achieve strong empirical results on standard and cold-start splits. While the architectural contributions are incremental—applying LightGCN to multi-hop session graphs is not novel—the integration with dynamic intent modeling via Transformers is well-executed. However, the reproducibility is hampered by the absence of released code, and some baseline comparisons (particularly Appformer) show anomalously poor performance that raises questions about experimental fairness.

“we propose MISApp, a profile-free framework that infers next-app intent directly from session-level behaviors without assuming the availability of static user profiles”
paper · 1 Introduction
“Appformer ... 0.2722 ... MISApp ... 0.5424”
paper · Table 2
What holds up

The multi-hop decomposition of session graphs ($\mathcal{G}_{1}^{S}, \mathcal{G}_{2}^{S}, \mathcal{G}_{3}^{S}$) with explicit hop-level attention ($W_{hop_{\gamma}}$) is a principled way to capture distinct structural ranges without over-smoothing, and the use of LightGCN avoids the heavy parameterization of standard GCNs. The strict cold-start protocol—splitting users 90/10 rather than chronologically—validates the claim of profile-free generalization, and the efficiency analysis demonstrates a favorable trade-off: MISApp achieves higher accuracy (ACC@1 0.5424) than MAPLE (0.5191) with 45× fewer parameters (1.32M vs 60M).

“constructs 1-Hop, 2-Hop, and 3-Hop session graphs based on temporal relations between apps ... LightGCN is applied to each graph to learn structural representations”
paper · Section 3.2
“MISApp ... Parameters (M) 1.32 ... MAPLE ... 60.0”
paper · Table 6
Main concerns

The baseline comparison against Appformer is problematic: Appformer achieves only 0.2722 ACC@1 on Tsinghua App Usage versus MISApp’s 0.5424, a gap so large (and below even simple heuristics like MFU at 0.1841) that it suggests implementation errors or severe hyperparameter misconfiguration rather than architectural inferiority. Additionally, the spatial context module relies on base station IDs available only in Tsinghua, making the LSapp results incomparable in ablations (Table 4 shows dashes for ‘w/o Spatial Context’ on LSapp). The interpretability claim—based on Kendall’s Tau of 0.71 between hop weights and PMI statistics—rests on only 50 hand-selected samples, which is insufficient for robust validation of attention alignment.

“Appformer ... 0.2722”
paper · Table 2
“w/o Spatial Context ... -- ... (LSapp columns)”
paper · Table 4
“average Kendall's coefficient of 0.71 ... based on ... 50 samples”
paper · Section 4.8.1
Evidence and comparison

The evidence supports the core claim that multi-hop modeling improves over single-hop (Table 4 shows ‘w/o Multi-hop Graph’ drops ACC@1 from 0.5424 to 0.5300 on Tsinghua), and the ablation validates each component’s contribution. However, the comparison to MAPLE—a strong LLM-based baseline—shows a modest improvement (0.5424 vs 0.5191), while the gap to Appformer is suspiciously extreme. The paper does not clarify whether Transformer-based time-series models (FEDformer, TimesNet) were tuned for session length $T=8$ or used default configurations, which matters given these models are designed for longer sequences. The claim of ‘profile-free’ operation holds for the cold-start split, though it conflates ‘no historical profile’ with ‘no user ID’—the model still requires an active session history to construct the multi-hop graph.

“w/o Multi-hop Graph ... 0.5300 ... MISApp ... 0.5424”
paper · Table 4
“MAPLE ... 0.5191 ... MISApp ... 0.5424”
paper · Table 2
Reproducibility

The paper provides comprehensive hyperparameter details ($d=64$, $K=3$, $T=8$, $L_g=2$ layers, Adam optimizer with lr=0.001) and uses public datasets (Tsinghua App Usage, LSapp), which aids reproduction. However, no code repository, data preprocessing scripts, or random seed specifications are provided in the arXiv entry, creating friction for independent verification. The cold-start evaluation relies on a specific user-level split (90/10) that must be reproduced exactly to match claims; without code, differences in session segmentation ($\delta_t = 300s$) or base station clustering ($k_{loc}=5$) could yield divergent results. The apparent underperformance of key baselines (Appformer, SA-GCN) suggests that baseline implementations may not have been standardized, which would block fair reproduction.

“embedding dimension of 64 ... immediate intent window size is set to $K=3$ ... session length is fixed to $T=8$ ... LightGCN module is set to $L_g=2$”
paper · Section 4.2
“$\delta_t = 300$ seconds”
paper · Section 3
Abstract

Predicting the next mobile app a user will launch is essential for proactive mobile services. Yet accurate prediction remains challenging in real-world settings, where user intent can shift rapidly within short sessions and user-specific historical profiles are often sparse or unavailable, especially under cold-start conditions. Existing approaches mainly model app usage as sequential behavior or local session transitions, limiting their ability to capture higher-order structural dependencies and evolving session intent. To address this issue, we propose MISApp, a profile-free framework for next app prediction based on multi-hop session graph learning. MISApp constructs multi-hop session graphs to capture transition dependencies at different structural ranges, learns session representations through lightweight graph propagation, incorporates temporal and spatial context to characterize session conditions, and captures intent evolution from recent interactions. Experiments on two real-world app usage datasets show that MISApp consistently outperforms competitive baselines under both standard and cold-start settings, while maintaining a favorable balance between predictive accuracy and practical efficiency. Further analyses show that the learned hop-level attention weights align well with structural relevance, offering interpretable evidence for the effectiveness of the proposed multi-hop modeling strategy.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.