Imputation of categorical variables

User 984 | 11/30/2014, 1:33:29 AM

The docs say that missing data is imputed using the mean for that column. When the column contains categorical variables, does this mean that the mode is used to replace missing elements?

Comments

User 91 | 11/30/2014, 2:19:03 AM

We use reference encoding for categorical variables (see http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter5/sasreg5.htm).

For missing value imputation, we impute with the mean in the encoding space which corresponds to the "mean" effect for a missing categorical variable.