In-Person vs AI Prediction for Suicide Risk—Why Not Both?

’Ensemble’ models offer better predictions than either model alone

05/13/2022
John McKenna, Associate Editor, BreakingMED™
Kevin Rodowicz, DO, Assistant Professor, St. Luke’s University/Temple University
Take Away
  1. Combining in-person screening using the Columbia Suicide Severity Rating Scale (C-SSRS) with EHR-based screening using the Vanderbilt Suicide Attempt Ideation Likelihood (VSAIL) model produced better suicide risk prediction than either model alone.

  2. These findings suggest that suicide risk prediction models combining in-person screening with EHR-based machine learning may mitigate the respective weaknesses of both data sources and synthesize their complementary strengths to produce a better risk prediction.

Combining face-to-face, in-person screening with a machine learning program offered better suicide risk prediction than either screening strategy alone, according to findings from a study published in JAMA Network Open.

Universal screening for suicide risk in emergency departments has come a long way in improving suicide risk detection among U.S. adults, and the Columbia Suicide Severity Rating Scale (C-SSRS)—a standardized assessment evaluating suicidal ideation and behavior—has been endorsed by both the CDC and the FDA as a useful tool for predicting risk. Research has suggested that predictive models using electronic health record (EHR) data might further enhance clinician’s ability to predict suicide risk, Colin G. Walsh, MD, MA, of Vanderbilt University Medical Center in Nashville, and colleagues explained; but would adding machine learning to in-person screening actually provide complimentary results?

To find out, Walsh and colleagues used data from the Vanderbilt University Medical Center (VUMC) to evaluate the ability of C-SSRS alone and the Vanderbilt Suicide Attempt Ideation Likelihood (VSAIL) model alone to predict suicide attempt (SA) and suicidal ideation (SI), before analyzing the performance of combined C-SSRS and VSAIL, in order to determine whether this "ensemble model" improved performance.

They found that "the combination of the C-SSRS and VSAIL models outperformed either alone in the prediction of SA and SI at all time intervals. By leveraging the complementary strengths of historical EHR data and face-to-face screening, ensemble learning improved discrimination at various risk thresholds. In the highest risk decile for SA at 30 days, only the ensemble models surpassed thresholds (for [positive predictive value] PPV and sensitivity) required for suicide prediction models to deliver health economic benefit. We found this improvement (especially in PPV) to be clinically significant, although the costs and benefits of our ensemble approach will vary greatly between health care sites."

The study authors concluded that "[f]urther research is needed to compare alternate ways of combining clinical and statistical risk prediction and to analyze the practical implications of implementing them in clinical systems."

For their analysis, Walsh and colleagues created an observational cohort consisting of adult patients at VUMC from June 2019 through September 2020. They extracted C-SSRS response data from all screened patients during an index visit; at that same visit, they also extracted the corresponding VSAIL risk scores generated at the beginning of each patient encounter. Included encounters consisted of inpatient, ambulatory surgical, and emergency department visits.

"The primary outcomes in this study were SA and SI, defined as separate events by coded self-injurious thoughts and behaviors (SITBs) occurring within 7, 30, 60, 90, and 180 days after the discharge date of each documented visit during the time period," they explained. "SITBs were extracted from encounter diagnosis documentation encoded as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) and International Statistical Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) stored in our EHR. To avoid circularity when using the C-SSRS to predict future SI and SA, we defined these outcomes exclusively through ICD codes without any reference to clinical screening."

The C-SSRS consists of six questions related to suicidal thoughts and behaviors, which were asked with respect to occurrences within the past month or since the patient’s last assessment. The VSAIL model uses historical EHR data (demographic data, diagnostic codes, medication data, past health care utilization, and area deprivation index via patient zip code) to silently calculate predictions of suicide risk at the start of routine clinical visits.

Walsh and colleagues assessed the retrospective validity of the C-SSRS, VSAIL, and ensemble models; discrimination metrics used by the study authors included area under the receiver operating curve (AUROC), area under the precision-recall curve (AUPR), sensitivity, specificity, and both positive and negative predictive value (PPV/NPV).

The final study cohort consisted of 120,398 unique index visits across 83,394 patients; among these, the mean patient age was 51.2 years, 54% were women, and 77% were White. In total, SA was identified in 84 cases at 7 days (0.07%), 205 at 30 days (0.17%), 272 at 60 days (0.23%), 356 at 90 days (0.43%), and 514 at 180 days (1.23%).

Among the findings:

  • "Within 30 days of an index visit, the combined models had higher AUROC (SA: 0.874-0.887; SI: 0.869-0.879) than both the VSAIL (SA: 0.729; SI: 0.773) and C-SSRS (SA: 0.823; SI: 0.777) models.
  • In the highest risk-decile, ensemble methods had PPV of 1.3% to 1.4% for SA and 8.3% to 8.7% for SI, outperforming both VSAIL (PPV for SA: 0.4%; PPV for SI: 3.9%) and C-SSRS (PPV for SA: 0.5%; PPV for SI: 3.5%).
  • In the highest risk-decile, ensemble methods had sensitivity of 77.6% to 79.5% for SA and 67.4% to 70.1% for SI, once again outperforming both VSAIL (sensitivity for SA: 28.8%; sensitivity for SI: 35.1%) and C-SSRS (sensitivity for SA: 76.6%; sensitivity for SI: 68.8%).

Walsh and colleagues argued that the combination of C-SSRS and VSAIL models may have been capable of outperforming either model alone because ensemble models "might have benefited from the relative strengths of the VSAIL and C-SSRS regression models at lower and higher risk thresholds, respectively. The C-SSRS predictions might have been limited by the commonality of patients denying SI despite being at high risk of SA and death. Performance of the VSAIL model may have suffered because some observations in the analysis did not have extensive historical clinical data. Ensemble methods seemed to mitigate the respective weaknesses of both data sources by exploiting their diversity while also synthesizing their independent, complementary strengths."

The study authors also acknowledged that in-person screening and EHR-based models might have strengths and weaknesses beyond predictive performance: "In-person screening requires time, mental health resources (which are often limited), training on standardized assessments (e.g., the C-SSRS), support from health care administrators, and workflow modifications. An important benefit of clinical screening is that it creates an opportunity for patient-physician dialogue that can lead to individualized treatment interventions. Although EHR-based machine learning can be automated at scale, developing, validating, and implementing predictive models requires a substantial initial resource investment. Ethical and legal issues around privacy, data usage, and accountability also hinder the adoption of machine learning in health care."

"Untangling the risks and benefits of suicide prevention approaches is difficult in cohort studies for several reasons, even when dealing with very large samples," Jordan E. DeVylder, PhD, of Fordham University in New York, wrote in an invited commentary accompanying the study. "For one, the low base rate of suicidal behavior (particularly over short time periods, when risk detection may be most useful) limits statistical power available to make nuanced comparisons between various methods of suicide prevention and, particularly, between subgroups of the general population. Furthermore, related studies to date have typically used suicide attempts or suicidal behavior as the primary outcomes, which can serve only as a proxy for the primary outcome of interest, death by suicide. Research focused on death by suicide, however, suffers from an even lower base rate than nonfatal suicidal behaviors and involves the sometimes onerous and logistically difficult process of combining cohort data with official death record data."

Regardless, Walsh and colleagues argued that their findings support using an EHR-based model as an initial detection tool to prompt additional in-person screening for suicide risk; or, alternatively, "[i]n settings where screening is widely administered, statistical prediction might be used secondarily to identify cases without prior suicidal behavior or with low screening risk due to nondisclosure. In-person screening and EHR-based models could also be (and often would be) implemented in parallel and combined with an ensemble model that outputs a final risk prediction and triggers clinical action."

DeVylder concluded that Walsh et al "have provided us with valuable data on the potential benefits of a combined approach, which can now be weighed against some of these other considerations to determine the best approach for a particular clinical setting."

Disclosures

Walsh reported receiving grants from the National Institutes of Health, the US Food and Drug Administration, the Military Suicide Research Consortium, Wellcome Leap, the Selby Stead Fund, Vanderbilt University Medical Center, and the Tennessee Department of Health; receiving personal fees from Southeastern Home Office Underwriters Association and Hannover Re; and holding equity in Sage AI outside the submitted work.

deVylder had no relevant relationships to disclose.

Sources

Walsh CG, et al "Integration of face-to-face screening with real-time machine learning to predict risk of suicide among adults" JAMA Netw Open 2022; 5(5):e2212095.

DeVylder JE "Suicide risk prediction in clinical settings—Additional considerations for face-to-face screening and machine learning approaches" JAMA Netw Open 2022; 5(5):e2212106.