MIHT: A Hoeffding Tree for Time Series Classification using Multiple Instance Learning

cs.LG Aurora Esteban, Amelia Zafra, Sebasti\'an Ventura · Mar 23, 2026
Local to this browser
What it does
MIHT tackles Time Series Classification (TSC) with variable-length, multivariate data—common in sensor and healthcare applications. The core idea combines Multiple Instance Learning (MIL) with Hoeffding Trees (incremental decision trees)...
Why it matters
The core idea combines Multiple Instance Learning (MIL) with Hoeffding Trees (incremental decision trees) to represent series as overlapping subseries bags and iteratively optimize which $k$ consecutive subseries are most discriminative....
Main concern
The paper presents a plausible integration of MIL with incremental decision trees, but its empirical claims are substantially weakened by methodological choices that handicap comparison methods. The novelty lies in adapting Hoeffding Trees...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

MIHT tackles Time Series Classification (TSC) with variable-length, multivariate data—common in sensor and healthcare applications. The core idea combines Multiple Instance Learning (MIL) with Hoeffding Trees (incremental decision trees) to represent series as overlapping subseries bags and iteratively optimize which $k$ consecutive subseries are most discriminative. The approach promises both handling of unequal-length inputs and interpretability via a single tree structure.

Critical review
Verdict
Bottom line

The paper presents a plausible integration of MIL with incremental decision trees, but its empirical claims are substantially weakened by methodological choices that handicap comparison methods. The novelty lies in adapting Hoeffding Trees for instance-selection within bags; however, the iterative optimization lacks theoretical guarantees (convergence, complexity), and key empirical comparisons rely on truncation strategies that disadvantage standard TSC baselines. The interpretability benefits are demonstrated but not quantified (e.g., tree depth/leaf counts are absent). The work is publishable with major caveats if the comparison methodology is explicitly acknowledged as a limitation.

“For methods that only support fixed-length TSC, the variable-length problems have been truncated to their shortest series”
paper · Section 5.2
“MIHT parameters: inst length $\omega$: 21% of the series, inst overlap $\lambda$: 2%, $k$: 4”
paper · Table 2
What holds up

The representation of time series as ordered bags of overlapping subseries (Section 4.1) is sensible for variable-length data and naturally integrates with MIL assumptions. The use of Hoeffding Trees is well-motivated for incremental model adaptation during the instance-selection loop. Section 5.3 provides concrete interpretability demonstrations showing identified discriminative variables ("dimension 0 at the 36th instant") and relevant temporal segments via the $\tau$ selection mechanism.

“The model identifies dimension 0 at the 36th instant as the most discriminative feature, with a threshold of $-4.86$”
paper · Section 5.3
“The original time series $X$ is transformed into a bag $B=[\mathbf{i}_0,...,\mathbf{i}_n]$ as an ordered succession of subseries”
paper · Section 4.1
Main concerns

The primary flaw is the truncation of variable-length series to their shortest length for fixed-length baseline methods (DrCif, ST, MUSE, ROCKET, TapNet, InceptionTime), which severely handicaps their performance and renders the comparison unfair—especially since MIHT is explicitly designed for variable length. The 'heuristic search' for hyperparameters (Table 2) lacks detail on validation procedures and whether test data was excluded. The convergence of the iterative optimization loop (Algorithm 1, lines 19-23 'while not convergence') has no theoretical analysis or empirical characterization of iterations required. Performance instability is evident: MIHT achieves only 3.8% accuracy on GestMidAirD3 versus 18.5-43.1% for other methods (Table 3), suggesting brittle behavior on certain temporal structures.

“For methods that only support fixed-length TSC, the variable-length problems have been truncated to their shortest series”
paper · Section 5.1
“GMAirD3: MIHT 0.038, DrCIF 0.185, HIVECOTE2 0.238”
paper · Table 3
“The parameters for MIHT have been established based on a heuristic search using a subset of the training partitions”
paper · Table 2
Evidence and comparison

The claim that MIHT 'outperforms 11 state-of-the-art models' (achieving best results on 11 of 28 datasets) is technically accurate but overstated given the handicapped baselines. While Friedman tests show statistically significant differences, the absolute performance gaps in Table 4 are modest (MIHT accuracy 0.5967 versus HIVECOTE2 at 0.5449). Notably, MIHT fails to complete or is excluded from comparison on InsectWingbeat and SpokenArabDig for TapNet/InceptionTime—datasets where deep learning methods struggle. The comparison mixes scenarios where baselines were truncated (favoring MIHT) with scenarios where MIHT wins on inherent capability (high-dimensional data like InsectWingbeat with 200 variables).

“MIHT's superiority, as it outperforms 11 state-of-the-art time series classification models”
paper · Abstract
“MIHT: 0.5967, DrCIF: 0.5470, HIVECOTE2: 0.5449”
paper · Table 4
Reproducibility

Code availability is explicitly stated via GitHub repository with 'instructions for reproducible experimentation.' Hyperparameters are detailed in Table 2, including $\omega=21\%$, $\lambda=2\%$, $k=4$, $\kappa=366.5\%$, $\delta=0.005615$, though the derivation of these specific values (especially non-intuitive percentages) is unexplained. Critical blockers to independent reproduction include: (1) The 'heuristic search' for parameters lacks implementation details; (2) The optimization stopping criterion in Algorithm 1 ('not convergence') is undefined—no threshold, maximum iterations, or stability metric is specified beyond the 'max iterations: 100' listed in Table 2 which conflicts with the while-loop structure; (3) Random initialization effects and seed values are not discussed.

“The source code of the proposal is publicly available in the repository associated”
paper · Section 5.1
“while not convergence”
paper · Algorithm 1
“max iterations: 100”
paper · Table 2
Abstract

Due to the prevalence of temporal data and its inherent dependencies in many real-world problems, time series classification is of paramount importance in various domains. However, existing models often struggle with series of variable length or high dimensionality. This paper introduces the MIHT (Multi-instance Hoeffding Tree) algorithm, an efficient model that uses multi-instance learning to classify multivariate and variable-length time series while providing interpretable results. The algorithm uses a novel representation of time series as "bags of subseries," together with an optimization process based on incremental decision trees that distinguish relevant parts of the series from noise. This methodology extracts the underlying concept of series with multiple variables and variable lengths. The generated decision tree is a compact, white-box representation of the series' concept, providing interpretability insights into the most relevant variables and segments of the series. Experimental results demonstrate MIHT's superiority, as it outperforms 11 state-of-the-art time series classification models on 28 public datasets, including high-dimensional ones. MIHT offers enhanced accuracy and interpretability, making it a promising solution for handling complex, dynamic time series data.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.