I asked Mike how Winsteps derived reliability measure. He told me this:
The "Observed Variance" is the variance of the person measures. Error Variance = RMSE**2 = the average of the squared
standard errors of the person measures
Then Model reliability = "true" variance / observed variance = (OVEV) / OV
I asked him if I can construct reliability measure based on a person measure file that WINSTEPS kicks out as one of the
result files. He said yes, so I decided to try it myself, so I feel I understand what it takes to come up with reliability
measure. Here are some of the files I had with mefrom a recent run of WINSTEPS  for my study with Charles Bidwell.
We created a teacher collegiality scale based on some NELS survey items:
And the output file of my Winstep run had this table. See the highlighted part that says Teacher RELIABILITY .76. My goal here is to derive
this number somehow by looking at the person measure file that I showed you above ( look at the file again).
TABLE 3.1 Teacher Collegiality
output.txt Aug 18 10:31 2006 INPUT: 5657 Teachers, 8 Items MEASURED: 4546 Teachers, 7 Items, 3 CATS
3.57.0 
SUMMARY OF 4416 MEASURED (NONEXTREME) Teachers ++ 
RAW
MODEL INFIT OUTFIT
  SCORE COUNT MEASURE
ERROR MNSQ ZSTD MNSQ ZSTD   
MEAN 14.6 7.0
.59 1.06 .89 .2 .91
.2   S.D. 2.3 .2
2.22 .22 .96 1.2 1.08
1.2   MAX. 20.0 7.0
5.31 2.76 9.19 4.8 9.07 4.5
  MIN. 4.0 2.0
5.31 .80 .00 1.9 .00
1.9    REAL RMSE 1.23
ADJ.SD 1.85 SEPARATION 1.51 Teacher RELIABILITY .69  MODEL RMSE
1.08 ADJ.SD 1.94 SEPARATION 1.79 Teacher RELIABILITY .76
  S.E. OF Teacher MEAN = .03
 ++ MAXIMUM EXTREME SCORE:
112 Teachers MINIMUM EXTREME SCORE: 18 Teachers LACKING
RESPONSES: 1111 Teachers VALID RESPONSES: 99.8%
I saved the person measure file as an excel sheet, so I can do some calculation using the excel sheet. This is
the excel file (770kb) you can download to see what I did.
This (see below) is how the excel sheet looks like. The parts that I highlited in yellow are the parts that I added.
Let me explain what I didone by one. The numeric headings I use correspond to the numbers I handwrote on the graphic.
1. Column G (Squared Error): this is a squared version of ERROR.
=F2*F2 is what I entered, for example, in a cell G2 to get a value 1.8225. This obviously is a result of doing
1.35 times 1.35.
2. OVESERVED VARIANCE
=VAR(B2:B4417)
By doing this, I got a variance for the measures.
3. ERROR VARIANCE
=AVERAGE(G2:G4417)
I got an average value of squared errors
4. "true" VARIANCE
=P3P5
We get "TRUE" VARIANCE if we do OBSERVED VARIANCE  ERROR VARIANCE. (In reality true variance is never known,
so that is why we say "true" instead of just true.)
5. MODEL RELIABILITY
=P8/P3
True VARIANCE devided by OBSERVED variance to get a realiability measure. The value was:
This value is the same as the statistics reported in the Winstep table shown above.
(Please don't forget to read on what I wrote below the graphicas my after thought.)
What did I learn?
I now feel I understood the concept of reliability well by actually "doing it." In retrospect,
what was counterintive to me was the fact that you can somehow consider error variance (EV) and observed variance (OV)
on a common matric (meaning you can substract one from the other  to get something meaningful.) Also it still feels
a bit strange that you can call it "true variance" when you do OV EV. It is likely that I will keep forgeting which
one is supposed to be substracted from which one. But at least now I have a very intuitive feeling about what EV and
OV are.
If someone asks me to explain what a reliability statistic is in fifteen seconds, I'd say this. When
you look at a distribution of scores (e.g., test scores), you may feel like there is a disitrbution, but some or a lot of
it could be a result of the scores having errors. Reliability statistic gives us a sense of how much of the distribution
can be overlapping with errors that are inherent in the measurement processes.
Also to understand the nature of this statistic, I'd look at this algorithm hard.
Reliability= (OVEV) / OV
where OV means observed variance and EV means error variance
READ MORE ABOUT RELIABILITY
I believe the same reliability statistic is computed when we do PROC CORR in SAS (requesting alpha). I wonder if
reliability statistic could be high when actually survey items are poorly written or respondents are lazy. Imagine respondents
say YES YES YES to anything they read. This will create a very small error variance and thus very high reliability statistic...
