BHDD: A Burmese Handwritten Digit Dataset

cs.CV cs.CL Swan Htet Aung, Hein Htet, Htoo Say Wah Khaing, Thuya Myo Nyunt · Mar 23, 2026

What it does

Why it matters

The authors release 87,561 verified images (28×28 grayscale, MNIST-compatible format) from over 150 contributors, with writer-independent train/test splits and baseline models reaching up to 99. 83% accuracy.

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

This paper introduces BHDD, the first public benchmark dataset for handwritten Burmese digits. Myanmar script's distinctive circular letterforms—originally developed for writing on palm leaves—create recognition challenges distinct from Latin digits, with pairs like 0 and 1 differing only by whether a circle is closed. The authors release 87,561 verified images (28×28 grayscale, MNIST-compatible format) from over 150 contributors, with writer-independent train/test splits and baseline models reaching up to 99.83% accuracy.

Critical review

Verdict

Bottom line

BHDD is a solid, carefully constructed dataset paper that fills a genuine gap in handwritten digit benchmarks. The methodology is sound: contributors are explicitly split between training and test sets to avoid writer overlap, quality assurance involves both automated deduplication and manual verification, and the mobile preprocessing app with real-time thresholding is a practical innovation for crowdsourced collection. The baseline experiments are sufficient to validate that the dataset is learnable while revealing script-specific challenge patterns.

“Approximately 120 contributors were assigned to the training set and the remaining roughly 30 to the test set, so no writer's handwriting appears in both splits.”

paper · Section III-A

“All 87,561 images in the final dataset are verified unique: no exact duplicates exist within or across the training and test sets.”

paper · Section III-A

What holds up

The data collection methodology demonstrates care for real-world noise: using phone cameras rather than scanners introduces lighting and angle variation, and the Android app with adaptive thresholding allowed contributors to verify digit extraction quality. The statistical analysis goes beyond simple pixel histograms to examine per-class ink coverage (30.4% for class 2 up to 56.8% for class 0), mean images showing consistent stroke patterns, and variance heatmaps identifying where handwriting styles diverge most. The script-specific confusion analysis between visually similar pairs (0/1, 0/8) provides actionable insights for future model development.

“Mean pixel intensity per class ranges from 12.5 (class 2) to 26.5 (class 0). Ink coverage (the fraction of non-zero pixels) ranges from 30.4% (class 2, a thin hook) to 56.8% (class 0, a full circle).”

paper · Section III-D

“Digits 0 and 1 are the hardest pair: 24 misclassifications between them (in both directions combined). The only difference is whether the circle is closed or has a small gap.”

paper · Section III-E

Main concerns

The test set exhibits extreme class imbalance (class 0: 6,856 samples vs. class 9: 389 samples), a nearly 18:1 ratio that complicates interpretation of overall accuracy metrics despite the reported macro-F1. With the improved CNN producing only 47 total misclassifications across 27,561 test samples, the error analysis—while informative—has limited statistical power for drawing general conclusions about difficult pairs. Additionally, most contributors came from Yangon with smaller representation from other regions, potentially limiting demographic diversity. The paper notes these limitations but does not quantify their impact on generalization.

“The test set is left unbalanced... with per-class counts ranging from 6,856 (class 0) to 389 (class 9).”

paper · Section III-C

“Of 27,561 test samples, only 47 are misclassified.”

paper · Section IV-C

“Most contributors were based in Yangon, with smaller groups from Mandalay, Nay Pyi Taw, Shan State, and the United States.”

paper · Section III-A

Evidence and comparison

The placement within the MNIST family of datasets is appropriate and the 28×28×1 format choice enables direct compatibility with existing data loaders. Comparisons to Kuzushiji-MNIST's finding that 'different scripts need their own benchmarks' is fair and well-supported by the paper's own confusion analysis showing unique error patterns (particularly the 0-1 ambiguity specific to circular scripts). The baselines are standard but adequate for a dataset paper; the progression from MLP (99.40%) to CNN (99.75%) to improved CNN (99.83%) demonstrates that augmentation and batch normalization provide expected gains without requiring exotic architectures.

“Kuzushiji-MNIST... found that the error patterns looked nothing like those on Latin digits—different scripts need their own benchmarks.”

paper · Section II

“MLP: 0.9940 accuracy, CNN: 0.9975 accuracy, Improved CNN: 0.9983 accuracy.”

paper · Table I

Reproducibility

Reproducibility is excellent. The dataset is publicly available under CC BY-SA 4.0 on GitHub with both pickle and IDX formats. Baseline code, exploration scripts, and usage examples are included. Experimental details are thorough: random seed (42), exact layer dimensions (256/128 for MLP, 32/64 filters for CNN), dropout rates (0.25 spatial, 0.5 dense), learning rate ($10^{-3}$), augmentation parameters ($\pm 15°$ rotation, $\pm 2$ px translation, 0.9–1.1× scale), and optimizer settings are all specified. The use of standard frameworks (scikit-learn, PyTorch) further ensures independent reproduction is straightforward.

“BHDD is at https://github.com/baseresearch/BHDD under CC BY-SA 4.0, in pickle and IDX formats. The repository includes exploration scripts, baseline code, and usage examples.”

paper · Section V

“All runs used seed 42... Trained 15 epochs with Adam ($\text{lr}=10^{-3}$)... rotation $\pm 15°$, translation $\pm 2$ px, scale 0.9–1.1×”

paper · Section IV-A

Abstract

We introduce the Burmese Handwritten Digit Dataset (BHDD), a collection of 87,561 grayscale images of handwritten Burmese digits in ten classes. Each image is 28x28 pixels, following the MNIST format. The training set has 60,000 samples split evenly across classes; the test set has 27,561 samples with class frequencies as they arose during collection. Over 150 people of different ages and backgrounds contributed samples. We analyze the dataset's class distribution, pixel statistics, and morphological variation, and identify digit pairs that are easily confused due to the round shapes of the Myanmar script. Simple baselines (an MLP, a two-layer CNN, and an improved CNN with batch normalization and augmentation) reach 99.40%, 99.75%, and 99.83% test accuracy respectively. BHDD is available under CC BY-SA 4.0 at https://github.com/baseresearch/BHDD

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.