Plotting model training and validation accuracy by iteration

User 2568 | 1/10/2016, 10:13:25 PM

I was viewing this presentation on GBT to get some ideas on how to improve my model accuracy and reduce over fitting.

I want to produce a chart like this. That is, i want to plot the model training and validation accuracy by iteration.

The necessary data is calculated at each iteration, and with verbose=True printed out. Is this accessible after the model has been created? Scikit-learn looks like it returns this in the model

What I'd like to be able to write is something like this

train, validate = data.random_split(0.8, seed=8754)
model = gl.boosted_trees_classifier.create(train, target='label', validate=validate)
    score = model['score'] # Get and array of  'iteration", 'training', 'validation'
plt(score['iteration], score['training'], score['iteration], score['validation']]

From this I can see the optimal iteration and get a better idea of over fit etc. By returning this as part of the model this valuable data is available any time, which is especially useful when doing hyper-parameter searches.

Comments

User 91 | 1/13/2016, 6:31:32 PM

Thanks for catching this. This looks like a bug from our end. We have this parameter exposed for logistic regression, svm, and linear regression, but it somehow isn't exposed in boosted trees and random forest.

We will be working on fixing this bug in a future release. Thanks for pointing it out!


User 2568 | 1/14/2016, 4:45:27 AM

So what is the parameter, say with linear regression?


User 3031 | 1/16/2016, 8:35:59 AM

I would like to know the same.