Gradient boosting

Project

Machine learning for concussion diagnosis

Data Science · Machine Learning · Classification

image

Overview

Concussions are mainly diagnosed based on self-reported symptoms. But we can use functional MRI (fMRI) to measure “brain activity” and perform diagnoses more objectively, given that fMRI disturbances are a proxy for the neuropathology of concussion. Full code is hosted on GitHub.

Approach

This project was completed in Python, using packages such as:

  • scikit-learn
  • GradientBoostingClassifier
  • SMOTE
  • dash

Resting-state functional MRI (rs-fMRI) offers a noninvasive way to explore brain activity in individuals with concussion. But the signal itself — the BOLD timeseries — can be measured in many ways. Entropy, amplitude, variability… which metric is most informative? And which brain regions matter most for diagnosis?

To explore this question, I built a synthetic dataset that mimics the structure of real rs-fMRI-derived features (the real-world clinical data this project is based on is pending publication). For each region of interest (ROI), defined using the Harvard-Oxford neuroanatomical atlas, the dataset includes six metrics commonly used in fMRI analysis: the BOLD signal's mean, standard deviation, entropy (via Lyapunov and Hurst exponents), ALFF (amplitude of low-frequency fluctuations), and fractional ALFF.

These features were used to train a gradient boosting classifier to distinguish between concussion and control cases. The pipeline handles imbalanced classes using SMOTE, standardizes inputs, and tunes hyperparameters using randomized search with cross-validation.

I then wrapped the primary outputs in an interactive Dash dashboard. The result: a browser-based interface that lets you explore the ROC and precision-recall curves interactively, with English-language summaries explaining each data point. The dashboard also includes a confusion matrix, cross-validation scores, feature importances, and a glossary of core machine learning terms.

Here is a screenshot of the dash, which can be accessed in its full dynamic version here. (It is hosted on the free version of Render, so it can take some time to load.)

image

This approach isn't about finding a definitive clinical answer — the data are synthetic. Similar to the true clinical data, these results suggest that one metric (#5) is more discriminatory than the others, as are select ROIs (such as those in the cerebellum). But more importantly, the project is designed to do three things:

  1. Surface interpretability: Metrics like precision and recall are often cited but rarely explained. This app gives context with every click.
  2. Demonstrate reproducibility: The full pipeline (training, evaluation, visualization) is contained in a clean repo, using safe-to-share data.
  3. Scale for future use: With real data, the same code could easily power decision support tools or research prototypes.

This project is part of an ongoing exploration of how visual and narrative explanations can support technical decision-making — particularly in healthcare, where both the stakes and the uncertainty are high.

The full code is available on GitHub, and the app is hosted for free on Render.

Outcomes

Concussions are diagnosed subjectively, yet with fMRI data and machine learning, we may be able to move towards objective measurement and diagnosis. This project primes the field for such action.

Credits

Original clinical data (publication pending).