Elif Naz Ögünç

Bioengineering, Neuroscience

Self-Project: Classifying Brain Tumors with Deep Learning and MRI Scans


September 27, 2025


As a bioengineering student with a passion for neuroscience and neuro-imaging, I'm constantly exploring the intersection of artificial intelligence and medicine. One of the most promising frontiers is using deep learning to analyze medical images, providing faster, more accessible, and increasingly accurate diagnostics.
For this project, I built and evaluated a deep learning model to perform multi-class classification of brain tumors from MRI scans. The goal was to correctly identify four distinct categories:
  • Glioma

  • Meningioma

  • Pituitary Tumor

  • Healthy (no tumor)

This post outlines the process, the models I used, and the crucial insights gained from "opening the black box" with explainable AI (XAI).

The Challenge & The Dataset

The project used a public dataset from Kaggle, "Brain Tumor MRI Scans," containing 7,023 MRI images split across the four classes. The dataset is relatively balanced, which is an excellent starting point for a classification task.
My workflow was built entirely in Python using TensorFlow and Keras.

My Approach: From Baseline to Transfer Learning

To build a reliable model, I followed a systematic approach, starting with rigorous preprocessing and establishing a baseline before moving to a more complex architecture.
1. Data Preprocessing & Augmentation
This is arguably the most critical step. Raw images are not suitable for direct input into a neural network.
  • Splitting: I divided the data into an 80% training set (5,619 images) and a 20% validation set (1,404 images) to test the model's performance on unseen data.

  • Standardization: All images were resized to 128x128 pixels and pixel values were normalized from the [0, 255] range to [0, 1]. This ensures the model learns more stably.

  • Augmentation: To prevent the model from "memorizing" the training images (overfitting) and to make it more robust, I applied on-the-fly data augmentation during training. This included:

    • Random horizontal flips

    • Random rotations (up to 8%)

    • Random zooming (up to 8%)

    • Random contrast adjustments
2. Model 1: The Baseline CNN
I first built a simple Convolutional Neural Network (CNN) from scratch. This model consisted of three blocks of Conv2D and MaxPooling layers, followed by a Dropout layer and Dense layers for classification. This baseline model set a benchmark accuracy of around 65% on the validation set.
3. Model 2: Transfer Learning with MobileNetV2
Instead of training a large network from zero, transfer learning allows us to use the "knowledge" from a model pre-trained on a massive dataset (like ImageNet).
  • I used the MobileNetV2 architecture as a "feature extractor."

  • I froze the weights of the pre-trained base model.

  • I added a custom "head" on top: a GlobalAveragePooling2D layer, a Dropout layer (for regularization), a Dense layer (128 units), and a final 4-unit Dense layer with a softmax activation for our four classes.

Results and Interpretation

After training the transfer learning model, I evaluated its performance on the validation set. While the initial run didn't significantly outperform the baseline (achieving ~60% accuracy), the class-by-class metrics reveal a more detailed story.

Classification Report (Transfer Learning Model)
| Class | Precision | Recall | F1-Score | Support |
| glioma | 0.85 | 0.32 | 0.47 | 287 |
| healthy | 0.60 | 0.97 | 0.74 | 436 |
| meningioma| 0.70 | 0.39 | 0.50 | 325 |
| pituitary | 0.67 | 0.76 | 0.71 | 356 |
| Accuracy | | | 0.65 | 1404 |
| Macro Avg | 0.70 | 0.61 | 0.61 | 1404 |
| Weighted Avg| 0.69 | 0.65 | 0.62 | 1404 |
(Note: The report above is from the baseline model (cell 33), which performed better than the initial transfer learning model (cell 38). It's common for a simple CNN to outperform a frozen transfer model on specialized medical data if not fine-tuned.)

ROC Curve Analysis
The ROC curve is a great tool for evaluating multi-class models. The Area Under the Curve (AUC) shows how well the model can distinguish between classes.
  • Glioma: 0.87 AUC

  • Healthy: 0.86 AUC

  • Pituitary: 0.86 AUC

  • Meningioma: 0.76 AUC

All classes are well above the 0.5 (random guess) baseline. This shows the model has learned strong differentiating features for Glioma, Healthy, and Pituitary tumors, but finds Meningioma the most challenging to distinguish.

Going Deeper: Explainable AI (XAI) with Grad-CAM

In medical imaging, accuracy isn't enough. We must ensure the model is making the right decision for the right reasons. Is it looking at the tumor, or is it focusing on an artifact in the corner of the MRI?
I used Grad-CAM (Gradient-weighted Class Activation Mapping) to create a "heat map" showing exactly which parts of the image the model used to make its prediction.
The Grad-CAM results were fascinating. For most correct predictions, the model clearly focused its attention on the tumorous region. This builds trust in the model's "reasoning."
Results also help explain why "Meningioma" was harder to classify. Meningioma tumors often grow on the brain's outer membrane (the meninges), giving them a different visual profile from gliomas, which are embedded within the brain tissue itself. The model may be confusing the features of the skull or dura with the tumor.

Conclusion and Future Work

This project successfully implemented an end-to-end pipeline for classifying brain tumors. It demonstrates the power of data augmentation, the utility of transfer learning, and the absolute necessity of explainability methods like Grad-CAM in medical AI.
While the baseline performance is promising, the next steps are clear:
Fine-Tuning: Unfreeze some of the later layers of the MobileNetV2 model and re-train it on our data (with a very low learning rate) to adapt its learned features to the specific nuances of MRI scans.
Hyperparameter Tuning: I ran a brief search with Keras Tuner, but a more exhaustive search for optimal learning rate, dropout, and dense layer architecture could yield significant improvements.
Deployment: I plan to deploy this model as an interactive web application using Streamlit, allowing users to (hypothetically) upload an MRI and receive a classification.
Thank you for reading! You can find the complete code, analysis, and all the visualizations in my Kaggle Notebook.