Hyperparameter tuning using model_parameter_search() function

User 1905 | 5/30/2015, 2:44:04 PM

Hi guys,

I'm trying to perform hyperparameter tuning of my boosted trees model using the modelparametersearch() function as described here (Scroll down to Part II).

Anyway, I'm encountering a problem that I can't figure out. It seems like my job is failing to create results but I don't understand why.

I was wondering if someone could help take a peek at my code and point out any obvious problems. (I'm pretty new at this so there probably are..)

Anyway, here it is:

def parameter_search(training, validation, target):
"""
Return the optimal parameters in the given search space.
The parameter returned has the lowest validation rmse.
"""
parameter_grid = {'features': ["features"],
                  'target': [target],
                  'max_depth': [10, 15, 20],
                  'min_child_weight': [5, 10, 20],
                  'step_size': [0.05],
                  'max_iterations': [10]}
job = gl.model_parameter_search.grid_search.create((training, validation),
                                                   model_factory=gl.boosted_trees_regression.create,
                                                   model_parameters=parameter_grid,
                                                   return_model=False)


# When the job is done, the result is a dictionary containing all the models
# being generated, and a SFrame containing summary of the metrics, for each parameter set.
summary = job.get_results()    

sorted_summary = summary.sort('validation_rmse', ascending=True)
print sorted_summary

optimal_model_idx = sorted_summary[0]['model_id']

# Return the parameters with the lowest validation error. 
optimal_params = sorted_summary[['max_depth', 'min_child_weight']][0]
optimal_rmse = sorted_summary[0]['validation_rmse']

print 'Optimal parameters: %s' % str(optimal_params)
print 'RMSE: %s' % str(optimal_rmse)
return optimal_params

This is how I execute the search:

 training, validation = training_sframe.random_split(0.8, seed=1)

 params = parameter_search(training,
                                 validation,
                                 target="category")

And this is the error I get:

	[INFO] Validating job.
	[INFO] Validation complete. Job: 'Model-Parameter-Search-May-30-2015-17-19-3700000' ready for execution
	[INFO] Job: 'Model-Parameter-Search-May-30-2015-17-19-3700000' scheduled.
	[WARNING] Trial run failed prior to launching model parameter search.  Please check for exceptions using get_metrics() on the returned object.
	No valid results have been created from this search.
	No valid results have been created from this search.
	---------------------------------------------------------------------------
	AttributeError                            Traceback (most recent call last)
	<ipython-input-25-75c7bbecba47> in <module>()
	      3 params_log_casual = parameter_search(training,
	      4                                      validation,
	----> 5                                      target="category")

	<ipython-input-24-f08b08678f85> in parameter_search(training, validation, target)
	     20     summary = job.get_results()
	     21 
	---> 22     sorted_summary = summary.sort('validation_rmse', ascending=True)
	     23     print sorted_summary
	     24 

	AttributeError: 'NoneType' object has no attribute 'sort'

Comments

User 91 | 5/30/2015, 6:46:56 PM

It seems that your job failed based on the message '''Trial run failed prior to launching model parameter search. Please check for exceptions using get_metrics()""". Try running the model parameter search in trial mode to figure out what the error is.


User 1905 | 5/31/2015, 2:00:24 PM

Thank you for the response.

So I added the argument to run the search in trial mode as you suggested but still get the same error.

Here is what my line looks like now:

job = gl.model_parameter_search.grid_search.create((training, validation),
                                                   model_factory=gl.boosted_trees_regression.create,
                                                   model_parameters=parameter_grid,
                                                   return_model=False,perform_trial_run=True)

User 19 | 5/31/2015, 5:12:45 PM

In that error message, it mentions "Please check for exceptions using get_metrics() on the returned object." For example, in your case it would be job.get_metrics().

Have you had a chance to do this? This should give us a hint why the trial run jobs are failing.


User 1905 | 6/1/2015, 2:50:31 PM

Thanks!

So here's the relevant output:

	[INFO] Start server at: ipc:///tmp/graphlab_server-3842 - Server binary: /Users/Saar/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1433169563.log
	[INFO] GraphLab Server Version: 1.4.0
	[INFO] Task started: _train_test_model-0-0, output path: /Users/Saar/.graphlab/artifacts/results/job-results-aabd2719-90a1-44f4-a8c0-aeb376b04490/output/_train_test_model-0-0-0-0-1433169525.22.gl
	[INFO] Task execution failed.
	Traceback (most recent call last)
	 Traceback (most recent call last):
	  File "/Users/Saar/anaconda/lib/python2.7/site-packages/graphlab/deploy/_executionenvironment.py", line 184, in _run_task
	    result = code(**inputs)
	  File "/Users/Saar/anaconda/lib/python2.7/site-packages/graphlab/toolkits/model_parameter_search/_model_parameter_search.py", line 328, in _train_test_model
	    model = model_factory(training_set, **model_parameters)
	  File "/Users/Saar/anaconda/lib/python2.7/site-packages/graphlab/toolkits/regression/boosted_trees_regression.py", line 482, in create
	    verbose = verbose, **kwargs)
	  File "/Users/Saar/anaconda/lib/python2.7/site-packages/graphlab/toolkits/_supervised_learning.py", line 369, in create
	    raise TypeError("Invalid feature %s: Feature names must be of type str" % x)
	TypeError: Invalid feature {'X998': 2, 'X1455': 1, 'X184': 1, 'X997': 1, 'X3884': 1, 'X2733': 2, 'X4213': 1, 'X9816': 1, 'X748': 5, 'X7247': 1, 'X3154': 1, 'X1428': 1, 'X8034': 6, 'X292': 1, 'X6025': 1, 'X112': 1, 'X1424': 4, 'X31': 5, 'X36': 12, 'X7386': 208, 'X1761': 1, 'X676': 2, 'X675': 1, 'X401': 2, 'X1045': 5, 'X104': 1, 'X106': 7, 'X9352': 1, 'X103': 5, 'X102': 7, 'X5741': 4, 'X770': 1, 'X6988': 16, 'X3801': 1, 'X1927': 1, 'X2127': 1, 'X4101': 1, 'X7396': 28, 'X10002': 0, 'X2417': 4, 'X25': 1, 'X766': 1, 'X1778': 1, 'X1164': 1, 'X3477': 2, 'X170': 7, 'X2951': 1, 'X175': 7, 'X179': 18, 'X2810': 1, 'X4030': 1, 'X1563': 1, 'X980': 27, 'X279': 8, 'X1568': 1, 'X22': 10, 'X270': 5, 'X3750': 4, 'X3505': 1, 'X51': 3, 'X1277': 15, 'X7798': 1, 'X880': 1, 'X59': 2, 'X493': 2, 'X491': 8, 'X490': 2, 'X2468': 1, 'X316': 16, 'X310': 1, 'X473': 1, 'X312': 3, 'X411': 3, 'X241': 1, 'X416': 5, 'X61': 4, 'X60': 1, 'X10': 1, 'X2363': 1, 'X3606': 28, 'X4160': 34, 'X3191': 2, 'X257': 2, 'X1914': 1, 'X250': 1, 'X3110': 6, 'X159': 7, 'X4084': 1, 'X15': 3, 'X461': 3, 'X8': 1, 'X4914': 1, 'X9521': 2, 'X3696': 1, 'X3': 192, 'X9401': 1, 'X787': 63, 'X2658': 1, 'X4': 106, 'X308': 1, 'X1150': 1, 'X3449': 9, 'X2351': 1, 'X3735': 12, 'X544': 3, 'X3202': 2, 'X2849': 3, 'X5541': 4, 'X6993': 1, 'X141': 10, 'X636': 1, 'X23': 2, 'X209': 5, 'X632': 1, 'X7': 62, 'X8603': 2, 'X1967': 1, 'X1795': 11, 'X5': 38, 'X2041': 3, 'X537': 6, 'X3538': 2, 'X1733': 5, 'X2238': 6, 'X3783': 3, 'X7113': 8, 'X9': 1, 'X832': 2, 'X83': 3, 'X80': 1, 'X6': 2, 'X1570': 1, 'X680': 1, 'X681': 1, 'X329': 4, 'X442': 2, 'X1978': 1, 'X192': 1, 'X280': 2, 'X7044': 3, 'X419': 1, 'X18': 1, 'X3549': 8, 'X5648': 1, 'X13': 2, 'X9664': 52, 'X17': 26, 'X91': 5, 'X8357': 1, 'X144': 1, 'X84': 10, 'X451': 1, 'X9076': 1, 'X127': 1, 'X126': 2, 'X121': 8}: Feature names must be of type str

Apparently it wants my feature set to consist only of strings. But when I run through boosted trees classifier (in the traditional way) it has no problem taking this data type.


User 19 | 6/1/2015, 5:40:11 PM

Hi Saarkagan,

I think one issue may be the value of your 'features' argument. Whenever you provide a value here, it needs to be a valid set of parameters to search. So if you want to specify that the only features to use is the "features" column, you should try using double brackets, for example:

parameter_grid = {'features': [["features"]], 'target': [target], 'max_depth': [10, 15, 20], ...

If you wanted to try different sets of features, you would have a list of lists: {'features': [["features"], ["features", "my_other_features"]], ...}

Try that and let us know how it goes! Chris


User 1905 | 6/9/2015, 9:35:03 PM

Guys it works.

Thank you very much!


User 19 | 6/9/2015, 10:51:09 PM

Great! Let us know if you have any more questions.