kuekawa – Page 38 – My Statistical tools

Imagine you have 100 excel files in your windows folder (or even 100,000) and you need to write names of those files in your SAS syntax (or any statistical software programs). Typing 100 names is time consuming. Instead you can use a cmd prompt at Windows and copy file names into a text file.

On Windows (7 in my case)

START --> RUN ..

Type in "cmd" in the pop-up window and OK it.

You get a small black window. Type "cd" at the prompt to get to the folder you want to go.

Examples:

cd Music (You will go to a folder Music;but this folder has to exist in the folder you are currently in)
cd C:\temp (This will directly let you go to the folder you want to go regardless of where you are currently in the holder structure; I used an example of C:\temp)
cd .. (You go up one folder structure)

Once you get to the folder, you will do this (this is an example of getting text files that have extention "txt" e.g., abc.txt).

dir *.txt > example_of_cmd_text.txt

This will copy all files that end with extention ".txt" into a text file. The resulting text file will include all file names in the folder. This is an example:

http://www.nippondream.com/file/example_of_cmd_text.txt

It comes with additional pieces of information you may not need (e.g., date), so you want to open this with Excel and get exact information you need.

I googled for CMD commands tutorials:

December 28, 2015December 29, 2015

Tricky things to describe about stat model specification

When describing statistical models and results in writing, the following are tricky issues and require decisions and standardized way of description (and they must be brief, intuitive, full of meaning):

How do we choose omitted category/reference group?
Why is there no level-1 error term in logistic regression?
Why use HLM?
Why use logistic regression model?
Meaning of odds ratio
Effect size interpretation (Why 2.0 is often used)
Why use certain covariates
How do we talk about predictors, covariates, and the treatment indicator (1 if treatment subject; else 0). There seems a difference between predictors and covariates.
How to discuss variance change (R2, etc.)
Negative level-2 variance in case of HLM
What do we do when between-group variance is small (the model may not converge)
What to do when the model does not converge?
How to deal with model names such as HLM, HGLM, etc.
When converting a scale or ordinal variable into a binary variable as an outcome of logistic regression, there are many possible cutpoints to define 0 vs. 1 (low vs. high). How do we justify?

December 18, 2015

SAFILE in Winsteps control file (Experiment)

Just an experiment with a Winsteps control option:
What happens if you enter random numbers to SAFILE.

CATEGORY PROBABILITIES table gets strange shaped curves.
Reliability gets wacky/low.

Conclusion: You will definitely notice something wrong happened.

Winsteps reference:
http://www.winsteps.com/winman/safile.htm

December 18, 2015March 22, 2016

Advantages of Rasch model

Rasch model analysis has the following set of advantages

Being logit scores, Rasch scores have no theoretical upper and lower boundary values (useful for statistical analysis)
Rasch scores facilitate pretest and posttest comparison based on different set of test items (You can avoid taking identical test at pre and post)
Rasch model can handle missing values (as long as a subject is not missing all items)
Rasch model (or Rasch model software programs) comes with an excellent set of diagnostics statistics to evaluate the model and data fit

December 18, 2015September 27, 2018

QC strategy for Rasch model results

BASIC QC

#01 CHECK RESPONSE VALUES (IF THEY ARE CODED INTO CORRECT NUMERALS)

Items used for Rasch model analysis are usually ordinal variables based on response values such as “Strongly agree,” Agree,” “Disagree,” and “Strongly Disagree.” Code these so that higher agreement receives higher numbers:

Strongly agree 4
Agree 3
Disagree 2
Strongly Disagree 1

If by mistake these numbers are flipped, you will have a catastrophic situation where the result is flipped. Do two things to prevent such a catastrophe:

Confirm this by looking at the actual survey and by looking at the data (Look at it until your eyes bleed).
People are likely to agree with items as they have social pressure to report good things when taking a survey. Look at the original data and see if you see a lot of positive responses.

#02 CHECK THE N OF SUBJECTS INCLUDED IN THE ANALYSIS

Check the output and confirm that the number of subject used is correct. Checking the number of subject is el numero uno protection against errors.

#03 CHECK THE N OF ITEMS INCLUDED IN THE ANALYSIS

Check the output and confirm that the number of items used is correct. Especially when you are not using all item’s data in your analysis (you might have decided to drop some items), be sure you used the ones you wanted to use. With Winsteps, misspecification of a control file can lead to inclusion of subject IDs as response data by mistake. Avoid this (such a case will produce an extremely low reliability score).

#04 CHECK WHAT VALUE WAS USED FOR MISSING SCORES

When a subject does not provide any response, Winsteps imputes a token number (-2, I think) to indicate that it is a missing value. This value should be treated as a missing value and should NOT be included in the analysis dataset. If you treat a token value (-2 in the case of Winsteps) as a true value, you will have a catastrophic situation where you have an arbitrary value used as a real data point. You should replace such a number with “.” (dot) before analysis as statistical software, such as SAS or SPSS, will treat a dot as a missing value.

Winteps Reference: Definition of status variable in Winsteps output

When a subject lacks data, missing value is indicated by -2.

http://www.winsteps.com/winman/ifile.htm

ADVANCED QC

Basic QC procedures should catch 99% of errors. Advanced ones are more intricate ones.

#05 INVESTIGATE ITEM DIFFICULTY SCORES

If you are using item difficulty parameters provided by the developer, compare them against the ones you derived from the dataset you collected. They must be more or less comparable. If not, investigate whether it is caused by a data error.

#06 HISTORICALLY COMPARE RESULTS

If you are repeating the study, compare your results with historical data (e.g., last year’s result).

December 15, 2015December 15, 2015

The Rasch-Audrich threshold (step parameter)

p.13 of

http://www.winsteps.com/a/winsteps-tutorial-2.pdf

Mike wrote this for me:

log (Pnij / Pni(j-1) ) = Bn - Di - Fj where Fj is the Andrich Threshold between categories j-1 and j.

December 2, 2015December 8, 2015

Cronbach's alpha

The UCLA site explains Cronbach's alpha as the average internal correlation among survey items. It also says that it is not a measure of unidimensionality. Rather, it is a measurement of internal consistency (though just intuitively I feel what is coherent tends to be also uni-dimensional... I think the point is that the measure is most optimal by design for the assessment of internal correlation, not dimentionality.

http://www.ats.ucla.edu/stat/spss/faq/alpha.html

Standardized versus Raw

This SAS website says one should use the standardized version of the measure (as opposed to raw).

https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect032.htm

It says: "Because the variances of some variables vary widely, you should use the standardized score to estimate reliability."

A note to myself: Does this mean if I standardized all items before the analysis, I get the same value for raw and standardized? I can experiment this.

December 1, 2015