Using random_search without specifiying a validation set and using a custom evaluator fails

User 2568 | 3/18/2016, 5:03:42 AM

I submitted the job with a custom evaluator and no validation set:

job = gl.random_search.create(train_data,gl.boosted_trees_classifier.create,
                         params, evaluator=auc_eval, max_models=20, perform_trial_run=False)

All the tasks failed in modelparameter_search.py", line 340

 evaluate_result = evaluator(model, training_set, validation_set)

I think that validationset is None in createmodelsearch at line 924. When None is passed to the custom evaluator this raises an error. In my view it does not make sense to call randomsearch without specifying a validation set or kfold, so createmodelsearch at line 924 should not set validation_set and should probably raise an error instead .

2016-03-18 15:38:06,963 [INFO] graphlab.deploy._executionenvironment, 262: Task execution failed.
Traceback (most recent call last)
 Traceback (most recent call last):
  File "/home/ec2-user/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.py", line 241, in _run_task
    result = code(**inputs)
  File "/home/ec2-user/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/model_parameter_search/_model_parameter_search.py", line 340, in _train_test_model
    evaluate_result = evaluator(model, training_set, validation_set)
  File "<ipython-input-3-c73657512aff>", line 3, in auc_eval
  File "/home/ec2-user/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/_model_workflow.py", line 16, in wrapper
    result = f(model, *args, **kwargs)
  File "/home/ec2-user/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/classifier/boosted_trees_classifier.py", line 287, in evaluate
    metric=metric)
  File "/home/ec2-user/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/_supervised_learning.py", line 178, in evaluate
    _raise_error_if_not_sframe(dataset, "dataset")
  File "/home/ec2-user/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/_internal_utils.py", line 346, in _raise_error_if_not_sframe
    raise ToolkitError(err_msg % variable_name)
ToolkitError: Input dataset is not an SFrame. If it is a Pandas DataFrame, you may use the to_sframe() function to convert it to an SFrame.

Error type    : ToolkitError
Error message : Input dataset is not an SFrame. If it is a Pandas DataFrame, you may use the to_sframe() function to convert it to an SFrame.

Comments

User 19 | 3/18/2016, 5:28:33 AM

Hi Kevin,

You are correct: When somebody writes a custom evaluator, this method expects them to gracefully handle the situation where validationset is None. For example, we include a validationset in the arguments of the custom evaluation function in the user guide, but we should have documented this better.

You are also correct that the method is most sensible when passing in a train/test split or a KFold object. We wanted to support the case where the user just provides a single SFrame.

Thank you for raising this issue, and please let me know if I missed anything. Chris


User 2568 | 3/18/2016, 5:43:32 AM

Chris, Thanks. I read user guide that and I just can't see what you mean. Also the examples in the API guide aren't much help either.

Anyhow, I'm aware of the problem and wanted to be sure others don't waste time on this.


User 19 | 3/18/2016, 5:49:59 AM

Hi Kevin,

Sorry for any inconvenience it may have caused. We will strive to make the documentation more clear on this point (or decide to raise an exception, as you describe).

Your feedback is invaluable, so please keep it coming! Chris