CornOrb: A Multimodal Dataset of Orbscan Corneal Topography and Clinical Annotations for Keratoconus Detection
CornOrb addresses a persistent gap in ophthalmic AI by providing one of the first large-scale, publicly accessible Orbscan 3 corneal topography datasets. The collection comprises 1,454 eyes from 744 Algerian patients, offering four standardized corneal maps (axial curvature, anterior/posterior elevation, pachymetry) alongside structured clinical parameters including Kmax, astigmatism, and asphericity. By releasing this multimodal resource in standardized PNG and CSV formats, the authors aim to enable robust AI-driven detection of keratoconus using device-specific data from an underrepresented African population.
CornOrb is a well-curated dataset descriptor that delivers on its promise of filling a specific niche—publicly available Orbscan 3 data for keratoconus research. The paper is rigorous in its statistical characterization (Welch’s t-tests with Benjamini–Hochberg correction) and adheres to FAIR principles. However, as a pure data descriptor without any baseline machine learning experiments, it stops short of demonstrating the dataset’s discriminative utility or interoperability with existing algorithms. The moderate class imbalance (889 normal vs. 565 keratoconus eyes, roughly 61:39) is acknowledged but presents a manageable rather than severe challenge for model training.
The dataset structure is methodologically sound and immediately usable: 560×560 pixel PNG images with consistent naming conventions (patientID_eye_maptype.png) linked to a comprehensive CSV containing 15 clinical variables. The hierarchical organization by patient code and laterality (OD/OS) properly accounts for inter-eye correlation. The authors appropriately highlight that this represents a geographically distinct cohort (North Africa) that is underrepresented in existing keratoconus datasets, which are predominantly Pentacam-based. The data dictionary in Table 1 is complete with precise units and descriptions for each variable.
The diagnostic labels rely on interpretation by two junior ophthalmologists with senior adjudication for disagreements—a process whose reliability is not quantified via inter-rater agreement statistics like Cohen’s kappa. This omission is significant given that AI models trained on these labels will inherit any diagnostic uncertainty. Additionally, the substantial age gap between groups (mean 27.9 years for keratoconus vs. 33.5 years for normal) introduces a strong confounding variable; models may learn to discriminate based on age-associated corneal changes rather than pathological features. The paper states this reflects "the natural history of the disease" but does not propose strategies to control for this in predictive modeling. Finally, the custom Python preprocessing pipeline used to extract maps from PDF exports is not publicly released, limiting full reproducibility of the image generation workflow.
The positioning against related work is fair: the authors accurately note that existing public datasets (Al-Timemy et al., Bakir, de Luna, Yousefi) predominantly feature Pentacam or OCT devices, leaving a genuine gap for Orbscan 3 data. However, the paper provides no evidence that the presented features actually enable accurate classification—no baseline accuracy, AUC, or confusion matrices are reported. Without even a simple logistic regression benchmark on the tabular features or a basic CNN on the images, readers must independently validate whether the dataset contains sufficient signal for the stated AI applications. The statistical differences in Table 2 (e.g., Kmax 53.1 D vs 43.7 D) are large and expected, but do not substitute for predictive validation.
The dataset itself is openly available via Zenodo (DOI: 10.5281/zenodo.17127265), satisfying FAIR principles for data accessibility. However, reproducibility is hampered by the absence of the custom Python pipeline used to parse PDF exports and standardize images to 560×560 pixels. Without this code, researchers cannot verify the cropping boundaries, color normalization, or potential information loss during PDF-to-PNG conversion. No suggested train/validation/test splits are provided, meaning every research group will use different splits, complicating fair benchmarking. Furthermore, the single-device (Orbscan 3), single-center (Algeria) nature means models trained on CornOrb may not generalize to Pentacam or Galilei devices without domain adaptation, a limitation the authors acknowledge but do not mitigate with cross-device calibration data.
In this paper, we present CornOrb, a publicly accessible multimodal dataset of Orbscan corneal topography images and clinical annotations collected from patients in Algeria. The dataset comprises 1,454 eyes from 744 patients, including 889 normal eyes and 565 keratoconus cases. For each eye, four corneal maps are provided (axial curvature, anterior elevation, posterior elevation, and pachymetry), together with structured tabular data including demographic information and key clinical parameters such as astigmatism, maximum keratometry (Kmax), central and thinnest pachymetry, and anterior/posterior asphericity. All data were retrospectively acquired, fully anonymized, and pre-processed into standardized PNG and CSV formats to ensure direct usability for artificial intelligence research. This dataset represents one of the first large-scale Orbscan-based resources from Africa, specifically built to enable robust AI-driven detection and analysis of keratoconus using multimodal data. The data are openly available at Zenodo.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.