Reliability and accuracy of resident evaluations of surgical faculty.

Overview

This study examines the reliability and accuracy of ratings by general surgery residents of surgical faculty. Twenty-three of 33 residents anonymously and voluntarily evaluated 62 surgeons in June, 1988; 24 of 28 residents evaluated 64 surgeons in June, 1989. Each resident rated each surgeon on a 5-point scale for each of 10 areas of performance: technical ability, basic science knowledge, clinical knowledge, judgment, peer relations, patient relations, reliability, industry, personal appearance, and reaction to pressure. Reliability analyses evaluated internal consistency and interrater correlation. Accuracy analyses evaluated halo error, leniency/severity, central tendency, and range restriction. Ratings had high internal consistency (coefficient alpha = 0.97). Interrater correlations were moderately high (average Pearson correlation = 0.63 among raters). Ratings were generally accurate, with halo error most prevalent and some evidence of leniency. Ratings by chief residents had the least halo. Results were generally replicable across the two academic years. We conclude that anonymous ratings of surgical faculty by groups of residents can provide a reliable and accurate evaluation method, ratings by chief residents are most accurate, and halo error may pose the greatest threat to accuracy, pointing to the need for greater definition of evaluation items and scale points.