De Novo Identification and Visualization of Important Cell Populations for Classic Hodgkin Lymphoma Using Flow Cytometry and Machine Learning.
Academic Article
Overview
abstract
OBJECTIVES: Automated classification of flow cytometry data has the potential to reduce errors and accelerate flow cytometry interpretation. We desired a machine learning approach that is accurate, is intuitively easy to understand, and highlights the cells that are most important in the algorithm's prediction for a given case. METHODS: We developed an ensemble of convolutional neural networks for classification and visualization of impactful cell populations in detecting classic Hodgkin lymphoma using two-dimensional (2D) histograms. Data from 977 and 245 clinical flow cytometry cases were used for training and testing, respectively. Seventy-eight nongated 2D histograms were created per flow cytometry file. Shapley additive explanation (SHAP) values were calculated to determine the most impactful 2D histograms and regions within histograms. SHAP values from all 78 histograms were then projected back to the original cell data for gating and visualization using standard flow cytometry software. RESULTS: The algorithm achieved 67.7% recall (sensitivity), 82.4% precision, and 0.92 area under the receiver operating characteristic. Visualization of the important cell populations for individual predictions demonstrated correlations with known biology. CONCLUSIONS: The method presented enables model explainability while highlighting important cell populations in individual flow cytometry specimens, with potential applications in both diagnosis and discovery of previously overlooked key cell populations.