Measurement issues in determining interrater agreement.

Vincent, Andrea Harrell.

Date

2002

Multiple indices have been proposed claiming to measure the amount of agreement between ratings of two or more judges on a multi-item measure. Unfortunately, simulation work based on these indices is lacking; thus we are left with very little understanding of exactly what should be expected of these indices and when they should work. The present investigation seeks to bridge this gap in the literature by comparing several of the more commonly used measures of interrater agreement via an Item Response Theory (IRT) model. The goal is to identify which agreement indices best recover true agreement.

In this manuscript, several agreement indices are compared. Among these are the kappa coefficient kappam (Fleiss, 1971); the intraclass correlation, ICC(2,1) (Shrout & Fleiss, 1979); several variants of the rWG(J) index (James, Demaree, & Wolf, 1984; Lindell, Brandt, & Whitney, 1999); a measure of agreement for ordinal data (Stine, 1989); and an index derived from a Latent Trait Model (Terry, 2000). Results identify two measures of agreement that consistently recover true agreement. Implications and extensions to the measurement of agreement in multiple contexts are addressed.

Keywords

Item response theory., Psychology, Applied., Psychometrics., Psychology, Psychometrics.

URI

http://hdl.handle.net/11244/467

Collections

OU - Dissertations

Full item page

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Description

Keywords

Citation

URI

DOI

Related file

Notes

Sponsorship

Collections