Feature selection

User 5179 | 5/15/2016, 2:46:46 PM

Do you intend to add some feature selection algorithms? Ex Backward, forward selection, max-relevance min-redundancy... I believe filtering a subset of the feature space with some algos or dimensionality reduction(I saw it is on the list), would eventually speed up the training (especially when using grid search).

Comments

User 1207 | 5/16/2016, 6:53:03 PM

Hello @Demir_Tonchev,

We have been investigating the algorithms you've mentioned, and we're in the stage of gathering data about specific use cases and domains. The challenge is that many of these tend to perform poorly in many ML contexts (albeit great in a few), and often give limited effectiveness, so we've put more emphasis on making sure our algorithms perform well when not all the features are informative.

One method that is effective, and that does get used frequently, is looking at the feature importance scores of a boosted tree model to use in trimming the feature space. This information tends to be quite accurate for feature selection, and applies to a number of different use cases.

Hope that helps! -- Hoyt


User 5179 | 5/16/2016, 7:28:26 PM

@hoytak I was thinking of something in that general direction. I still think that having more tools to select relevant features would be great.

Thanks for the answer.


User 1207 | 5/18/2016, 5:51:54 PM

@Demir_Tonchev,

Thank you for the thoughts! I forgot to mention that in this case, you can get the feature importance from the get_feature_importance() method of a trained boosted-tree model. Using that information, you can select a subset of the important features.

-- Hoyt