Comparison of Cohen's Kappa and Gwet's AC1 with a mass shooting classification index: A study of rater uncertainty
Abstract
In order to quantify the degree of agreement between raters when classifying subjects into predefined categories, inter-rater reliability (IRR) experiments are often conducted in the medical field. Originally, percent agreement was used to calculate the extent of agreement between raters; however, it was criticized for not taking into account chance-agreement. Chance-agreement refers to the propensity for raters to guess when classifying nondeterministic subjects to categories. In other words, raters can be certain that some subjects are textbook and are associated with a true category membership, whereas, other subjects are ambiguous and require true random guessing (Schuster & Smith, 2002). A commonly used chance-corrected agreement coefficient has been Cohen's Kappa. Limitations have been associated with the Kappa statistic such as Kappa's tendency to overcorrect for chance-agreement in the presence of high prevalence rates (i.e., highly skewed data). Due to such issues, Gwet (2014) proposed a new chance-corrected agreement coefficient called the AC1 statistic. The purpose of this study was to examine Cohen's Kappa and Gwet's AC1 with respect to prevalence rates and rater uncertainty using a newly developed classification system for mass shooters. A new methodology for identifying textbook and ambiguous subjects was demonstrated. Specifically, the purposes of the present study were (1) to examine how Cohen's Kappa and Gwet's AC1 are affected by prevalence rates and (2) to determine whether there are differences in the observable discrepancies between Cohen's Kappa and Gwet's AC1 for subjects classified as textbook compared to subjects classified as ambiguous. Findings indicated that observable discrepancies between Cohen's Kappa and Gwet's AC1 could be seen in both the textbook and ambiguous conditions. Specifically, analyses suggested that percent agreement was likely to overestimate the extent of true agreement among raters and Cohen's Kappa was likely to underestimate the extent of true agreement among raters. The ambiguous analysis revealed larger discrepancies between Gwet's AC1 and Cohen's Kappa in the presence of highly skewed data, however, discrepancies between Gwet's AC1 and Cohen's Kappa appeared to be more dependent on the number of observable disagreements between raters during the textbook analysis. Recommendations for practice and future research are discussed.
Collections
- OSU Dissertations [11222]