CornOrb: A Multimodal Dataset of Orbscan Corneal Topography and Clinical Annotations for Keratoconus Detection

cs.CV Mohammed El Amine Lazouni, Leila Ryma Lazouni, Zineb Aziza Elaouaber, Mohammed Ammar, Sofiane Zehar, Mohammed Youcef Bouayad Agha, Ahmed Lazouni, Amel Feroui, Ali H. Al-Timemy, Siamak Yousefi, Mostafa El Habib Daho · Mar 22, 2026
Local to this browser
What it does
CornOrb addresses a persistent gap in ophthalmic AI by providing one of the first large-scale, publicly accessible Orbscan 3 corneal topography datasets. The collection comprises 1,454 eyes from 744 Algerian patients, offering four...
Why it matters
The collection comprises 1,454 eyes from 744 Algerian patients, offering four standardized corneal maps (axial curvature, anterior/posterior elevation, pachymetry) alongside structured clinical parameters including Kmax, astigmatism, and...
Main concern
CornOrb is a well-curated dataset descriptor that delivers on its promise of filling a specific niche—publicly available Orbscan 3 data for keratoconus research. The paper is rigorous in its statistical characterization (Welch’s t-tests...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

CornOrb addresses a persistent gap in ophthalmic AI by providing one of the first large-scale, publicly accessible Orbscan 3 corneal topography datasets. The collection comprises 1,454 eyes from 744 Algerian patients, offering four standardized corneal maps (axial curvature, anterior/posterior elevation, pachymetry) alongside structured clinical parameters including Kmax, astigmatism, and asphericity. By releasing this multimodal resource in standardized PNG and CSV formats, the authors aim to enable robust AI-driven detection of keratoconus using device-specific data from an underrepresented African population.

Critical review
Verdict
Bottom line

CornOrb is a well-curated dataset descriptor that delivers on its promise of filling a specific niche—publicly available Orbscan 3 data for keratoconus research. The paper is rigorous in its statistical characterization (Welch’s t-tests with Benjamini–Hochberg correction) and adheres to FAIR principles. However, as a pure data descriptor without any baseline machine learning experiments, it stops short of demonstrating the dataset’s discriminative utility or interoperability with existing algorithms. The moderate class imbalance (889 normal vs. 565 keratoconus eyes, roughly 61:39) is acknowledged but presents a manageable rather than severe challenge for model training.

“All variables differed significantly after correction (all p_BH<0.001)”
CornOrb paper · Table 2
“the dataset also presents a moderate class imbalance (889 normal vs. 565 keratoconus eyes), reflecting real-world prevalence but requiring appropriate handling”
CornOrb paper · Limitations section
What holds up

The dataset structure is methodologically sound and immediately usable: 560×560 pixel PNG images with consistent naming conventions (patientID_eye_maptype.png) linked to a comprehensive CSV containing 15 clinical variables. The hierarchical organization by patient code and laterality (OD/OS) properly accounts for inter-eye correlation. The authors appropriately highlight that this represents a geographically distinct cohort (North Africa) that is underrepresented in existing keratoconus datasets, which are predominantly Pentacam-based. The data dictionary in Table 1 is complete with precise units and descriptions for each variable.

“File names explicitly encode the patient identifier, eye laterality, and map type (e.g., 001_OD_Axial.png), ensuring easy reproducibility and traceability in AI workflows”
CornOrb paper · Data Description
“large-scale resources based on Orbscan 3 imaging are virtually absent”
CornOrb paper · Background
Main concerns

The diagnostic labels rely on interpretation by two junior ophthalmologists with senior adjudication for disagreements—a process whose reliability is not quantified via inter-rater agreement statistics like Cohen’s kappa. This omission is significant given that AI models trained on these labels will inherit any diagnostic uncertainty. Additionally, the substantial age gap between groups (mean 27.9 years for keratoconus vs. 33.5 years for normal) introduces a strong confounding variable; models may learn to discriminate based on age-associated corneal changes rather than pathological features. The paper states this reflects "the natural history of the disease" but does not propose strategies to control for this in predictive modeling. Finally, the custom Python preprocessing pipeline used to extract maps from PDF exports is not publicly released, limiting full reproducibility of the image generation workflow.

“Diagnostic labels were independently assigned by two junior ophthalmologists; in cases of disagreement, a senior ophthalmologist adjudicated the final label”
CornOrb paper · Experimental Design
“Age (years): Normal 33.5±7.4, Keratoconus 27.9±7.1, p_BH<0.001”
CornOrb paper · Table 2
Evidence and comparison

The positioning against related work is fair: the authors accurately note that existing public datasets (Al-Timemy et al., Bakir, de Luna, Yousefi) predominantly feature Pentacam or OCT devices, leaving a genuine gap for Orbscan 3 data. However, the paper provides no evidence that the presented features actually enable accurate classification—no baseline accuracy, AUC, or confusion matrices are reported. Without even a simple logistic regression benchmark on the tabular features or a basic CNN on the images, readers must independently validate whether the dataset contains sufficient signal for the stated AI applications. The statistical differences in Table 2 (e.g., Kmax 53.1 D vs 43.7 D) are large and expected, but do not substitute for predictive validation.

“Bakir et al. released a Pentacam-based dataset... Al-Timemy et al. first introduced a large Pentacam dataset... Yousefi et al. released a large dataset of tabular parameters derived from OCT-based tomography”
CornOrb paper · Background
Reproducibility

The dataset itself is openly available via Zenodo (DOI: 10.5281/zenodo.17127265), satisfying FAIR principles for data accessibility. However, reproducibility is hampered by the absence of the custom Python pipeline used to parse PDF exports and standardize images to 560×560 pixels. Without this code, researchers cannot verify the cropping boundaries, color normalization, or potential information loss during PDF-to-PNG conversion. No suggested train/validation/test splits are provided, meaning every research group will use different splits, complicating fair benchmarking. Furthermore, the single-device (Orbscan 3), single-center (Algeria) nature means models trained on CornOrb may not generalize to Pentacam or Galilei devices without domain adaptation, a limitation the authors acknowledge but do not mitigate with cross-device calibration data.

“The data are openly available at Zenodo (10.5281/zenodo.17127265)”
CornOrb paper · Data Description
“A custom Python pipeline was developed to automatically parse the reports... All images were visually inspected”
CornOrb paper · Experimental Design
Abstract

In this paper, we present CornOrb, a publicly accessible multimodal dataset of Orbscan corneal topography images and clinical annotations collected from patients in Algeria. The dataset comprises 1,454 eyes from 744 patients, including 889 normal eyes and 565 keratoconus cases. For each eye, four corneal maps are provided (axial curvature, anterior elevation, posterior elevation, and pachymetry), together with structured tabular data including demographic information and key clinical parameters such as astigmatism, maximum keratometry (Kmax), central and thinnest pachymetry, and anterior/posterior asphericity. All data were retrospectively acquired, fully anonymized, and pre-processed into standardized PNG and CSV formats to ensure direct usability for artificial intelligence research. This dataset represents one of the first large-scale Orbscan-based resources from Africa, specifically built to enable robust AI-driven detection and analysis of keratoconus using multimodal data. The data are openly available at Zenodo.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.