Ontology-driven personalized information retrieval for XML documents

cs.IR cs.LG Ounnaci Iddir, Ahmed-ouamer Rachid, Tai Dinh · Mar 22, 2026

What it does

Why it matters

The core idea is a hierarchical weighting scheme that favors specific (deeper) ontology concepts combined with a dynamic profile update mechanism that reinforces concepts based on user interactions. The work targets the limitation of...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

This paper addresses personalized information retrieval for XML documents by representing users, queries, and documents as weighted concept vectors derived from a domain ontology. The core idea is a hierarchical weighting scheme that favors specific (deeper) ontology concepts combined with a dynamic profile update mechanism that reinforces concepts based on user interactions. The work targets the limitation of traditional keyword-based systems that return identical results regardless of user knowledge or preferences.

Critical review

Verdict

Bottom line

The paper proposes a conceptually sound framework for combining semantic ontologies with XML structural retrieval, but suffers from limited experimental scale and mathematically ad-hoc design choices. While the reported improvements are substantial—precision increasing from $P=0.426$ to $P=0.710$ and recall from $R=0.756$ to $R=0.978$—the evaluation relies on a small custom dataset of only 400 documents with synthetically generated queries, severely limiting generalizability.

“lower-level nodes in the hierarchy provide more precise and targeted information”

paper · Abstract

“Over the five requests in Table 2, the baseline averages are P=0.426 and R=0.756, while the proposed configuration achieves higher averages of P=0.710 and R=0.978”

paper · Section 5.3

What holds up

The integration of ontology-based concept weighting with XML hierarchical structure is well-motivated, and the taxonomy of related work comprehensively maps the landscape of XML IR and personalization. The paper correctly identifies that emphasizing specific concepts improves retrieval granularity, and the results consistently show that "incorporating ontology weights and the interest vector improves both precision and recall across the tested requests." The dynamic profile update mechanism offers a plausible approach to adapting to evolving user interests over successive queries.

“incorporating ontology weights and the interest vector improves both precision and recall across the tested requests”

paper · Section 5.3

Main concerns

Two major issues undermine the technical contribution. First, the mathematical formulations for both concept weighting and profile updates appear arbitrary and lack theoretical justification: the margin calculation $\Delta = 1/(\sum (Coef(C_k) - Coef(C_1)))^2$ and the exponential update rule $w_{CI_j}(t+1) = (e^{w_{tj}} - 1) + w_{CI_j}(t)$ (Eq. 5) risk numerical instability since exponential growth can produce unbounded weights without normalization constraints. Second, the evaluation relies entirely on a custom-built corpus of only 400 documents with manually constructed queries derived from ontology concepts, which may not reflect realistic user information needs and severely constrains comparability with standard benchmarks.

“w_{CI_j}(t+1) = (e^{w_{tj}} - 1) + w_{CI_j}(t)”

paper · Section 4.1.1, Eq. 5

“we built our own test database. It contains 400 XML documents covering several computer science topics”

paper · Section 5.1

Evidence and comparison

The evidence supports the relative claim that ontology weighting improves over the non-weighted baseline, but absolute comparisons with modern methods are absent. The authors acknowledge that "datasets, ontologies, query sets, and relevance assessment procedures differ substantially across related studies," yet they do not attempt to normalize against or compare with neural retrieval methods that might exploit similar structural signals. The use of synthetic queries generated by "selecting ontology concepts and using their keywords" rather than organic user queries limits ecological validity.

“datasets, ontologies, query sets, and relevance assessment procedures differ substantially across related studies”

paper · Section 5.3

“generated 300 queries by selecting ontology concepts and using their keywords to simulate different information needs”

paper · Section 5.1

Reproducibility

Reproducibility is promised but not yet delivered. The authors state they "will make the source code and experimental data publicly available to support reproducibility," but no DOI, repository link, or supplementary materials are provided in the current version. The custom dataset construction and ontology instantiation are described at a high level, but without access to the specific OWL ontology files, the 400 XML documents, the generated queries, and the exact relevance judgment protocol, independent validation of the claimed F1-scores of 0.781 remains impossible. The implementation uses Java, Servlets, and the Jena API, but hyperparameters for the profile update mechanism are not specified.

“we will make the source code and experimental data publicly available to support reproducibility”

paper · Introduction, end of Section 1

Abstract

This paper addresses the challenge of improving information retrieval from semi-structured eXtensible Markup Language (XML) documents. Traditional information retrieval systems (IRS) often overlook user-specific needs and return identical results for the same query, despite differences in users' knowledge, preferences, and objectives. We integrate external semantic resources, namely a domain ontology and user profiles, into the retrieval process. Documents, queries, and user profiles are represented as vectors of weighted concepts. The ontology applies a concept-weighting mechanism that emphasizes highly specific concepts, as lower-level nodes in the hierarchy provide more precise and targeted information. Relevance is assessed using semantic similarity measures that capture conceptual relationships beyond keyword matching, enabling personalized and fine-grained matching among user profiles, queries, and documents. Experimental results show that combining ontologies with user profiles improves retrieval effectiveness, achieving higher precision and recall than keyword-based approaches. Overall, the proposed framework enhances the relevance and adaptability of XML search results, supporting more user-centered retrieval.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.