Machine Learning Algorithms Predict Achievement of Clinically Significant Outcomes After Orthopaedic Surgery: A Systematic Review.
Review
Overview
abstract
PURPOSE: To determine what subspecialties have applied machine learning (ML) to predict clinically significant outcomes (CSOs) within orthopaedic surgery and to determine whether the performance of these models was acceptable through assessing discrimination and other ML metrics where reported. METHODS: The PubMed, EMBASE, and Cochrane Central Register of Controlled Trials databases were queried for articles that used ML to predict achievement of the minimal clinically important difference (MCID), patient acceptable symptomatic state (PASS), or substantial clinical benefit (SCB) after orthopaedic surgical procedures. Data pertaining to demographic characteristics, subspecialty, specific ML algorithms, and algorithm performance were analyzed. RESULTS: Eighteen articles met the inclusion criteria. Seventeen studies developed novel algorithms, whereas one study externally validated an established algorithm. All studies used ML to predict MCID achievement, whereas 3 (16.7%) predicted SCB achievement and none predicted PASS achievement. Of the studies, 7 (38.9%) concerned outcomes after spine surgery; 6 (33.3%), after sports medicine surgery; 3 (16.7%), after total joint arthroplasty (TJA); and 2 (11.1%), after shoulder arthroplasty. No studies were found regarding trauma, hand, elbow, pediatric, or foot and ankle surgery. In spine surgery, concordance statistics (C-statistics) ranged from 0.65 to 0.92; in hip arthroscopy, 0.51 to 0.94; in TJA, 0.63 to 0.89; and in shoulder arthroplasty, 0.70 to 0.95. Most studies reported C-statistics at the upper end of these ranges, although populations were heterogeneous. CONCLUSIONS: Currently available ML algorithms can discriminate the propensity to achieve CSOs using the MCID after spine, TJA, sports medicine, and shoulder surgery with a fair to good performance as evidenced by C-statistics ranging from 0.6 to 0.95 in most analyses. Less evidence is available on the ability of ML to predict achievement of SCB, and no evidence is available for achievement of the PASS. Such algorithms may augment shared decision-making practices and allow clinicians to provide more appropriate patient expectations using individualized risk assessments. However, these studies remain limited by variable reporting of performance metrics, CSO quantification methods, and adherence to predictive modeling guidelines, as well as limited external validation. LEVEL OF EVIDENCE: Level III, systematic review of Level III studies.