Extending model create methods to accept dataset as SFrame | KFold | tuple

User 2568 | 3/19/2016, 3:20:25 AM

Having used modelparametersearch extensively over the last few days, I found the idea of accepting the dataset as SFrame | KFold | tuple worked really well. Its simple to understand and very expressive from a computation language perspective.

I propose the model create methods work the same way. This has the following benefits: 1. the model create methods work the same as the parameter search methods. 2. I can write BoostedTreesClassifier.classify((train, validate) ....) which is a nice shorthand for BoostedTreesClassifier.classify(train, validation_set=validate ....) . The short hand is perhaps more natural as I can now write the following which is a very elegant description of the computation:

dataset = train_data.random_split(0.8)
model = BoostedTreesClassifier.classify(dataset, ....)
  1. For kfold cross validation, I can write the following, which is simple and clear:

    kfold = traindata.kfold(5) modellist = BoostedTreesClassifier.classify(kfold ....)

which returns a list of models.

What do you think?


User 1190 | 3/19/2016, 8:19:20 PM

This is a good idea (I assume you mean create() not classify() above). Except that now when the input dataset is a kfold, the return type is list[model] instead of model. Such overloading on return type of a function is a problem for people who consumes the API. You can create a new create() wrapper that takes kfold easily, just like what we did for the modelparametersearch.

User 2568 | 3/19/2016, 9:59:10 PM

Its not unreasonable to expect the API to return a different data structure if you call it with a different signature.

User 2568 | 3/20/2016, 7:41:06 AM

Jay I gave this more thought and realised that this could be made to work by thinking of these as a new model type, Ensemble. I've opened a new feature request for Ensemble here