Kaz's SAS, HLM, and Rasch Model
RELIABILITY MEASURE produced by WINSTEPS
Home
Large National Data Sets
Kaz Library
SAS manuals
What is "error"?
Rasch Model
HLM
SAS PROC NLMIXED and GLIMMIX
Factor Analysis
Reading output via SAS
Excel functions for Statistical Analysis
Essays on learning statistics
Statistics in Japanese
My profile
My SAS questions and SAS's responses
My work tool box

How WINSTEPS derives a reliability measure

I asked Mike how Winsteps derived reliability measure.  He told me this:
 
The "Observed Variance" is the variance of the person measures.
Error Variance = RMSE**2 = the average of the squared standard errors of the person measures
Then
Model reliability = "true" variance / observed variance = (OV-EV) / OV
 
I asked him if I can construct reliability measure based on a person measure file that WINSTEPS kicks out as one of the result files.  He said yes, so I decided to try it myself, so I feel I understand what it takes to come up with reliability measure.  Here are some of the files I had with me--from a recent run of WINSTEPS -- for my study with Charles Bidwell.  We created a teacher collegiality scale based on some NELS survey items:
And the output file of my Winstep run had this table.  See the highlighted part that says Teacher RELIABILITY  .76.  My goal here is to derive this number somehow by looking at the person measure file that I showed you above (look at the file again).
 
TABLE 3.1 Teacher Collegiality                      output.txt Aug 18 10:31 2006
INPUT: 5657 Teachers, 8 Items  MEASURED: 4546 Teachers, 7 Items, 3 CATS     3.57.0
--------------------------------------------------------------------------------
 
     SUMMARY OF 4416 MEASURED (NON-EXTREME) Teachers
+-----------------------------------------------------------------------------+
|           RAW                          MODEL         INFIT        OUTFIT    |
|          SCORE     COUNT     MEASURE   ERROR      MNSQ   ZSTD   MNSQ   ZSTD |
|-----------------------------------------------------------------------------|
| MEAN      14.6       7.0         .59    1.06       .89    -.2    .91    -.2 |
| S.D.       2.3        .2        2.22     .22       .96    1.2   1.08    1.2 |
| MAX.      20.0       7.0        5.31    2.76      9.19    4.8   9.07    4.5 |
| MIN.       4.0       2.0       -5.31     .80       .00   -1.9    .00   -1.9 |
|-----------------------------------------------------------------------------|
| REAL RMSE   1.23  ADJ.SD    1.85  SEPARATION  1.51  Teacher RELIABILITY  .69 |
|MODEL RMSE   1.08  ADJ.SD    1.94  SEPARATION  1.79  Teacher RELIABILITY  .76 |
| S.E. OF Teacher MEAN = .03                                                  |
+-----------------------------------------------------------------------------+
  MAXIMUM EXTREME SCORE:    112 Teachers
  MINIMUM EXTREME SCORE:     18 Teachers
      LACKING RESPONSES:   1111 Teachers
        VALID RESPONSES:  99.8%

I saved the person measure file as an excel sheet, so I can do some calculation using the excel sheet.  This is the excel file (770kb) you can download to see what I did. 

 

This (see below) is how the excel sheet looks like.  The parts that I highlited in yellow are the parts that I added.  Let me explain what I did--one by one.  The numeric headings I use correspond to the numbers I handwrote on the graphic.
1. Column G (Squared Error): this is a squared version of ERROR. 
=F2*F2 is what I entered, for example, in a cell G2 to get a value 1.8225.  This obviously is a result of doing 1.35 times 1.35.
 
2. OVESERVED VARIANCE
=VAR(B2:B4417)
By doing this, I got a variance for the measures.
 
3. ERROR VARIANCE
=AVERAGE(G2:G4417)
I got an average value of squared errors
 
4. "true" VARIANCE
=P3-P5
We get "TRUE" VARIANCE if we do OBSERVED VARIANCE - ERROR VARIANCE.  (In reality true variance is never known, so that is why we say "true" instead of just true.)
 
5. MODEL RELIABILITY
=P8/P3
True VARIANCE devided by OBSERVED variance to get a realiability measure.  The value was:
0.76194
 
This value is the same as the statistics reported in the Winstep table shown above.   (Please don't forget to read on what I wrote below the graphic--as my after thought.)
 

reliability2.jpg

What did I learn?
 
I now feel I understood the concept of reliability well by actually "doing it."  In retrospect, what was counterintive to me was the fact that you can somehow consider error variance (EV) and observed variance (OV) on a common matric (meaning you can substract one from the other --- to get something meaningful.)  Also it still feels a bit strange that you can call it "true variance" when you do OV- EV.  It is likely that I will keep forgeting which one is supposed to be substracted from which one.  But at least now I have a very intuitive feeling about what EV and OV are.
 
If someone asks me to explain what a reliability statistic is in fifteen seconds, I'd say this.  When you look at a distribution of scores (e.g., test scores), you may feel like there is a disitrbution, but some or a lot of it could be a result of the scores having errors.  Reliability statistic gives us a sense of how much of the distribution can be overlapping with errors that are inherent in the measurement processes.
 
Also to understand the nature of this statistic, I'd look at this algorithm hard.
 
Reliability= (OV-EV) / OV
where OV means observed variance and EV means error variance
 
READ MORE ABOUT RELIABILITY
I believe the same reliability statistic is computed when we do PROC CORR in SAS (requesting alpha).  I wonder if reliability statistic could be high when actually survey items are poorly written or respondents are lazy.  Imagine respondents say YES YES YES to anything they read.  This will create a very small error variance and thus very high reliability statistic...

Enter supporting content here

Copyright 2005 KU
For information inquiry (AT) estat.us