How to normalize data for nearest neighbors

User 3031 | 1/15/2016, 5:01:45 PM

I'm taking the ML class on coursera and am playing around with various ML methods in GraphLab create. I notice that graphlab.linearregression.create have a featurerescaling parameter that scales the input features to the l2 norm. But no such parameter exists for graphlab.nearest_neighbors.create. This seems strange to me. Are the features automatically scaled? If not, how to scale conveniently, using sFrame / GraphLab functionality? Is there maybe a universal normalizing or scaling function or method somewhere within the sFrame / GraphLab working environment?

Comments

User 12 | 1/16/2016, 1:20:08 AM

Hi @optisizer, nearest neighbors does not automatically scale the features, although we've been thinking about adding this feature.

There aren't any scaling methods (yet) in the feature engineering module, so I think the best bet is to use SFrame methods. For example, to standardize to mean 0 and unit variance:

for c in sf.column_names(): sf[c] = (sf[c] - sf[c].mean()) / sf[c].std()