BoostedTreesRegression and missing values

User 1766 | 4/16/2015, 12:38:04 AM

BoostedTreesRegression in the GraphLab create originated from https://github.com/dmlc/xgboost

Github version can work with datasets that have missing values.

Graphlab create version does not.

Problem with github version: It loads all dataset into memory, and if it is big, my 16Gb RAM laptop stops the task Problem with Graphlab Create: It would be nice if I was not obligated to fill missing values in some way (mean/median or use machine learning techniques to predict them)

Is it possible to use BoostedTreesRegression on a dataset with missing values?

Comments

User 1190 | 4/17/2015, 8:56:35 PM

Hi @forseti,

Thank you for your feedback. We are planning to add support missing values, it's been high priority on our list. At the moment, you can try using SFrame.fillna with float('nan') for dealing with missing values.

-jay