Failed to run random_search.create after upgrading from 1.8.5 to 1.9

User 5163 | 4/29/2016, 11:16:59 AM

I'm using a dataset from Kaggle's Give Me Some Credit (cs-training.csv) and based on the documentation of GraphLab, I used the code below to create a random hyperparameter search with cross validation: ` folds = gl.cross_validation.KFold(sf, 5)

params = {'target': 'SeriousDlqin2yrs', 'classweights': 'auto', 'maxdepth': [3, 4, 6, 8, 10, 12, 18], 'maxiterations': [10, 20, 50], 'minlossreduction': [0, 2, 5, 8], 'stepsize': [0.05, 0.1, .2, .3, 0.5], 'columnsubsample': [.6, .8, 1.0], 'rowsubsample': [.6, .8, 1.0], 'minchildweight': [0.1, 2, 5, 8] } job = gl.randomsearch.create(folds, gl.boostedtrees_classifier.create, params) While using version 1.8.5 the code ran without any errors. When I upgraded Graphlab to version 1.9 and got the following error:


TypeError Traceback (most recent call last) <ipython-input-80-9fe6b728419c> in <module>() 14 folds, 15 gl.boostedtreesclassifier.create, ---> 16 params)

/Users/gg/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/randomsearch.pyc in create(datasets, modelfactory, modelparameters, evaluator, environment, returnmodel, performtrialrun, maxmodels) 147 environment=environment, 148 returnmodel=returnmodel, --> 149 performtrialrun=performtrial_run) 150

/Users/gg/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/modelparametersearch.pyc in createmodelsearch(datasets, modelfactory, modelparameters, strategy, evaluator, environment, returnmodel, performtrialrun) 990 name=jobname, 991 environment=environment, --> 992 returnmodel=returnmodel) 993 return m

/Users/gg/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/modelparametersearch.pyc in init(self, factory, parametersets, name, strategy, environment, returnmodel) 571 # Tuning parameter for dividing jobs into batches 572 batchsize = max(10, math.ceil(len(parametersets) / 3.0)) --> 573 parameterbatches = [c for c in chunks(parametersets, batchsize)] 574 575 # Construct jobs

/Users/gg/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/modelparameter_search.pyc in chunks(l, n) 566 Yield successive n-sized chunks from l. 567 """ --> 568 for i in xrange(0, len(l), n): 569 yield l[i:i+n] 570

TypeError: integer argument expected, got float `

Comments

User 19 | 4/29/2016, 4:52:41 PM

Hi ggony,

Thanks for reporting this! I've confirmed this is a new bug.

The fastest workaround right now is to make a small change to the Python package. You'll want to edit a file called _model_parameter_search.py. If you are using Miniconda it will be located in this directory: ~/miniconda2/lib/python2.7/site-packages/graphlab/toolkits/model_parameter_search/ and if you are using a conda or virtualenv environment, it will similarly be in the site-packages folder.

In the file, please edit

batch_size = max(10, math.ceil(len(parameter_sets) / 3.0)) to be batch_size = max(10, int(math.ceil(len(parameter_sets) / 3.0)))

Let me know if you have any trouble with this. I am happy to help.

Sorry about the inconvenience here. Chris


User 5163 | 4/29/2016, 5:30:20 PM

Hi Chris,

Problem solved! Thank you for your help.

Gonçalo


User 5191 | 5/13/2016, 12:58:27 AM

Unfortunately still getting this error after making the correction you suggested to batchsize...I'm going to look through modelparametersearch.py more tonight but made the edit and reloaded the module to no avail. Any help would be greatly appreciated! Thank you!

`python folds = gl.crossvalidation.KFold(regressiondata,5) params = {'target': 'y', 'maxdepth' : range(1,6)} job2 = gl.modelparametersearch.create(folds, gl.boostedtreesregression.create, params) 2016-05-12 17:42:31,691 [INFO] graphlab.deploy.job, 22: Validating job. 2016-05-12 17:42:31,715 [INFO] graphlab.deploy.mapjob, 186: Validation complete. Job: 'Model-Parameter-Search-May-12-2016-17-42-3100000' ready for execution 2016-05-12 17:42:32,336 [INFO] graphlab.deploy.map_job, 192: Job: 'Model-Parameter-Search-May-12-2016-17-42-3100000' scheduled.


TypeError Traceback (most recent call last) <ipython-input-100-6fc72f9b7856> in <module>() 1 folds = gl.crossvalidation.KFold(regressiondata,5) 2 params = {'target': 'y', 'maxdepth' : range(1,6)} ----> 3 job2 = gl.modelparametersearch.create(folds, gl.boostedtrees_regression.create, params)

/net/shendure/vol1/home/hauser/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/init.pyc in create(datasets, modelfactory, modelparameters, evaluator, environment, performtrialrun, returnmodel, maxmodels) 300 returnmodel=returnmodel, 301 performtrialrun=performtrialrun, --> 302 maxmodels=maxmodels)

/net/shendure/vol1/home/hauser/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/randomsearch.pyc in create(datasets, modelfactory, modelparameters, evaluator, environment, returnmodel, performtrialrun, maxmodels) 147 environment=environment, 148 returnmodel=returnmodel, --> 149 performtrialrun=performtrial_run) 150

/net/shendure/vol1/home/hauser/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/modelparametersearch.pyc in createmodelsearch(datasets, modelfactory, modelparameters, strategy, evaluator, environment, returnmodel, performtrialrun) 990 name=jobname, 991 environment=environment, --> 992 returnmodel=returnmodel) 993 return m

/net/shendure/vol1/home/hauser/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/modelparametersearch.pyc in init(self, factory, parametersets, name, strategy, environment, returnmodel) 571 # Tuning parameter for dividing jobs into batches 572 batchsize = max(10, int(math.ceil(len(parametersets) / 3.0))) --> 573 parameterbatches = [c for c in chunks(parametersets, batchsize)] 574 575 # Construct jobs

/net/shendure/vol1/home/hauser/anaconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/modelparametersearch/modelparameter_search.pyc in chunks(l, n) 566 Yield successive n-sized chunks from l. 567 """ --> 568 for i in xrange(0, len(l), n): 569 yield l[i:i+n] 570

TypeError: integer argument expected, got float `


User 12 | 5/13/2016, 6:37:25 PM

Hi @proteogenomics, can you verify the edit is actually there in the code that you're running? If you're using IPython, try typing psource gl.toolkits.model_parameter_search.ModelSearchJob in the console, and checking line 572 for the int conversion of the batch_size. The change did work for me, but the file was in a different place from what @ChrisDuBois described above.

Thanks, Brian


User 5191 | 5/16/2016, 5:50:51 PM

I feel rather silly; simply restarting my iPython instance and the corresponding kernel seemed to fix the situation. Despite "updating" from with my iPython notebook, for some reason the implementation wasn't using the updated code.