This retrospective study assembled an international cohort of 6206 leukemia patients from 20 centers to test and refine an artificial intelligence (AI) tool designed to support leukemia diagnosis using standard laboratory results. The goal was to address health disparities by potentially improving access to diagnosis. The pretrained algorithm was executed on this diverse cohort, yielding varying accuracy metrics. When applying a confidence cutoff for predictions, the 2000-fold bootstrapped area under the curve (AUROC) metrics were 0.94 for acute myeloid leukemia (AML), 0.98 for the promyelocytic subtype, and 0.84 for acute lymphoblastic leukemia (ALL). However, this confidence cutoff approach excluded a substantial proportion of patients from receiving predictions, ranging from 70.8% to 92.5%. To improve the tool's utility, the researchers enhanced its accuracy and robustness while maintaining generalizability. They implemented an ensemble method combining Isolation Forest and Local Outlier Factor. This refinement increased the AUROC for AML from 0.72 to 0.84 on a hold-out test set specifically for patients who fell below the initial confidence threshold. Importantly, this improved model excluded only 12.1% of patients from predictions, a significant reduction from the earlier high exclusion rates. An additional development noted in the abstract is that the algorithm was retrained specifically for pediatric patients. The study demonstrates a process of international testing and iterative refinement of an AI diagnostic support tool, showing that modifications can substantially reduce the rate of excluded patients while improving performance for a subset of cases.
Imagine having symptoms that could be leukemia, but you can't get to a specialist for the complex tests needed to confirm it. This is a reality for many people around the world. A team looked at whether an artificial intelligence (AI) tool could help by using just the results from standard blood tests to predict the type of leukemia a person might have. They tested it on a large, diverse group of over 6,200 patients from 20 different centers worldwide. The tool was very good at spotting certain types, like acute myeloid leukemia and a specific subtype called promyelocytic leukemia. But there was a big catch: to get that high accuracy, the tool had to refuse to make a prediction for the vast majority of patients—between 71% and 93% of the time. That's not very helpful for doctors. So, they refined the tool using a different method. This new version was less likely to refuse a prediction, excluding only about 12% of patients, and its accuracy for spotting acute myeloid leukemia in those uncertain cases improved. They also specifically retrained the tool to work better for children. The work shows that AI could one day be a useful support tool, helping more people get a faster, initial indication of their condition using tests they can already get.
What this means for you: An AI tool can predict leukemia types from basic lab work, but it needs to be more confident to help most patients.
View Original Abstract ↓
Despite advances for patients with acute leukemia health disparities limit access to diagnosis and treatment. Artificial Intelligence (AI) approaches may address some disparities. We retrospectively assemble a diverse, international cohort of 6206 leukemia patients from 20 centers to test an AI tool designed to support leukemia diagnosis using standard laboratory results. Executing the pretrained algorithm results in varying accuracy metrics. With confidence cutoff predictions, 2000-fold bootstrapped area under the curve (AUROC) metrics are 0.94 for acute myeloid leukemia (AML), 0.98 for the promyelocytic subtype and 0.84 for acute lymphoblastic leukemia. However, this cutoff excludes 70.8-92.5% of patients from predictions. We improve accuracy and robustness, while maintaining generalizability via an ensemble of Isolation Forest and Local Outlier Factor increasing AUROC for AML from 0.72 to 0.84 (hold-out test set, patients below confidence threshold), while excluding only 12.1% of patients. Furthermore, we retrain the algorithm for pediatric patients.