Enhancing Brain Tumor Classification Using Vision Transformers with Colormap-Based Feature Representation on BRISC2025 Dataset

cs.CV Faisal Ahmed · Mar 22, 2026
Local to this browser
What it does
This paper proposes applying Vision Transformers with colormap-based pseudo-color enhancement to brain tumor classification on the BRISC2025 MRI dataset. The core idea wraps a standard ViT-Base model with a Jet colormap preprocessing step...
Why it matters
90% accuracy on four-class tumor classification. While the technique is sound in principle, serious copy-paste errors indicate the manuscript was likely templated from the author's prior Alzheimer's work without adequate revision.
Main concern
The paper suffers from critical editorial errors that undermine its credibility: Figure 3's caption incorrectly describes 'Evaluation of TDA+DenseNet121 on the OASIS-1 dataset' when the paper actually uses ViT on BRISC2025. Table 2's...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

This paper proposes applying Vision Transformers with colormap-based pseudo-color enhancement to brain tumor classification on the BRISC2025 MRI dataset. The core idea wraps a standard ViT-Base model with a Jet colormap preprocessing step to boost contrast, claiming 98.90% accuracy on four-class tumor classification. While the technique is sound in principle, serious copy-paste errors indicate the manuscript was likely templated from the author's prior Alzheimer's work without adequate revision.

Critical review
Verdict
Bottom line

The paper suffers from critical editorial errors that undermine its credibility: Figure 3's caption incorrectly describes 'Evaluation of TDA+DenseNet121 on the OASIS-1 dataset' when the paper actually uses ViT on BRISC2025. Table 2's caption similarly references 'Alzheimer's disease classification on OASIS' rather than brain tumors. These errors suggest the manuscript was templated from the author's nearly identical Alzheimer's paper (arXiv:2512.16964) with insufficient editing. Methodologically, the contribution is incremental—simply applying the author's established colormap+ViT pipeline to a new dataset with no architectural innovation.

“Evaluation of TDA+DenseNet121 on the OASIS-1 dataset. (a) One-vs-rest ROC curves illustrating discrimination performance across the four Alzheimer’s disease classes.”
Ahmed, Figure 3 caption · Section 5, Figure 3
“Published accuracy results for four-class Alzheimer’s disease classification on OASIS or OASIS-derived MRI datasets.”
Ahmed, Table 2 caption · Section 5, Table 2
What holds up

The colormap enhancement strategy (converting grayscale MRI to pseudo-color via Jet colormap) is a valid preprocessing technique for adapting grayscale medical images to ImageNet-pretrained ViTs. The mathematical formulation of the Vision Transformer pipeline is standard and correctly described, following Dosovitskiy et al.'s architecture with multi-head self-attention $\text{MSA}(\mathbf{Z})=\text{Softmax}\left(\frac{\mathbf{QK}^{\top}}{\sqrt{d}}\right)\mathbf{V}$ and LayerNorm.

“MSA(\mathbf{Z})=\text{Softmax}\left(\frac{\mathbf{QK}^{\top}}{\sqrt{d}}\right)\mathbf{V}”
Ahmed, Section 3.3 · Section 3.3
Main concerns

Beyond the copy-paste errors, the comparison with CNN baselines is potentially unfair: the paper does not clarify whether ResNet50, ResNet101, and EfficientNetB2 were trained on colormap-enhanced images or raw grayscale. If baselines lacked this preprocessing, the claimed improvements (98.90% vs. 98.37% for EfficientNetB2) merely reflect dataset bias rather than architectural superiority. Reference [28] for the BRISC2025 dataset is completely empty, providing no citation details. The dataset description is vague ('approximately 6,000' samples with 'around 5,000 for training'), and the experimental setup describes training on an Apple M1 laptop without specifying GPU acceleration or exact compute time, raising reproducibility concerns.

Additionally, the novelty is minimal—the methodology is identical to the author's prior work on Alzheimer's classification (arXiv:2512.16964), including the same preprocessing equations and nearly identical text, suggesting a 'salami slicing' approach rather than substantive new contribution.

“\mathbf{I}_{\text{rgb}}=\text{Colormap}\left(\frac{\mathbf{I}_{\text{gray}}}{255}\right)”
Ahmed, Alzheimer's paper · arXiv:2512.16964, Section 3.1
“\mathbf{I}_{\text{color}}=\mathcal{C}(\mathbf{I}_{\text{norm}})”
Ahmed, current paper · Section 3.1
“[28] Cited by: §4.1”
Ahmed, References · Reference [28]
Evidence and comparison

The 98.90% accuracy claim is supported by Table 2, but the baselines appear to come from a single prior paper (Fateh et al., cited as [27]) rather than being re-implemented by the author. The paper explicitly states 'No Data Augmentation' was used for the ViT, yet does not confirm whether the baseline CNNs from Fateh et al. also omitted augmentation. The margin over EfficientNetB2 (0.53 percentage points) is slim and may fall within statistical variance given the unknown confidence intervals. The AUC of 99.97% appears artificially high given typical medical imaging variance.

“No Data Augmentation: Unlike many deep learning approaches... the proposed colormap-enhanced Vision Transformer framework is trained without applying any data augmentation.”
Ahmed, Section 4.2 · Section 4.2
Reproducibility

Reproduction is severely hindered by insufficient experimental detail. The Apple M1 laptop setup (8-core CPU, 16GB RAM) lacks GPU specifications and training time estimates—deep learning on ViT-Base would be prohibitively slow on CPU-only M1. No code repository is mentioned. The BRISC2025 dataset reference is incomplete. The paper states '5,000 images for training and 1,000 for testing' but later uses an 80:20 split for a 6,000-image dataset, creating ambiguity (4,800 vs. 5,000 training samples). Early stopping with patience $P=15$ is mentioned but validation set size and split strategy are unspecified.

“All experiments were conducted on a personal laptop equipped with an Apple M1 system-on-chip, featuring an 8-core CPU (4 performance cores and 4 efficiency cores) and 16 GB of unified memory.”
Ahmed, Section 4.2 · Section 4.2
“around 5,000 images for training and 1,000 for testing”
Ahmed, Section 4.1 · Section 4.1
Abstract

Accurate classification of brain tumors from magnetic resonance imaging (MRI) plays a critical role in early diagnosis and effective treatment planning. In this study, we propose a deep learning framework based on Vision Transformers (ViT) enhanced with colormap-based feature representation to improve multi-class brain tumor classification performance. The proposed approach leverages the ability of transformer architectures to capture long-range dependencies while incorporating color mapping techniques to emphasize important structural and intensity variations within MRI scans. Experiments are conducted on the BRISC2025 dataset, which includes four classes: glioma, meningioma, pituitary tumor, and non-tumor cases. The model is trained and evaluated using standard performance metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). The proposed method achieves a classification accuracy of 98.90%, outperforming baseline convolutional neural network models including ResNet50, ResNet101, and EfficientNetB2. In addition, the model demonstrates strong generalization capability with an AUC of 99.97%, indicating high discriminative performance across all classes. These results highlight the effectiveness of combining Vision Transformers with colormap-based feature enhancement for accurate and robust brain tumor classification and suggest strong potential for clinical decision support applications.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.