Gradient boosted trees

User 1067 | 3/18/2015, 3:28:39 PM


I read that you use XGboost as a part of your gradient boosted trees toolkit. I am currently experimenting with different python machine learning tools and I compared your gl.boostedtreesclassifier to XGboost on the same dataset. It was slower but the prediction error was better. Is it because gl.boostedtreesclassifier implement other models? Or is it just because I preprocessed my dataset to CSR to use XGboost?



User 19 | 3/18/2015, 4:02:20 PM

There are quite a few potential reasons, among them:

  • Unless you set validation_set=None, we automatically sample a validation set for you.
  • We have several preprocessing steps implemented in order to smoothly handle various kinds of data that can appear in the input SFrame, e.g. missing data, creating categorical variables, etc.

Do you see a difference on a per-tree basis (for the same sized data set, parameter configuration, etc.)?

Thanks, Chris