User 2915 | 12/28/2015, 6:49:13 PM
I have a question guys about categorizing data! So, I believe in sklearn everything is treated as numbers(ints) for any model(regression, random forest) you try! Okay, I see why do you want to convert string categories(like male/female) to 0 and 1. My question(s) are:
What if I have a column with 6 categories, but they are already numbers(from 1 to 6). Do I need to create dummy variables which include ONLY 0 or 1? Or should I leave it as it is? Is '1' and '2' categorical data same as '0' and '1'? Or is the latter better so that the model does not consider it actual numbers! I'm confused.