Model parameter search in EC2 environment: Is it possible to upload KFolds data tuples in a S3 Path?

User 3147 | 2/11/2016, 9:59:38 AM

Hi all, I am currently interested to deploy a rather tedious model parameter search in a EC2 environment. I have configured the EC2 cluster correctly and I have already sent the corresponding job for execution, I am wondering though if there is a way to also save the KFolds data tuples in a S3 path, prior of executing the task. I guess, it will be much more efficient that way. Probably I am asking something naive, but I can't see how it would be possible to do so. Any help?

Comments

User 2593 | 2/23/2016, 6:14:23 PM

Hi @theod,

One thing you can do is create the folds before doing parameter search, and write them separately to s3. Then when you are running your grid search you can go ahead and read those chunks from within your grid search job.

Let me know if you have any questions and thanks!

Charlie


User 3147 | 2/23/2016, 7:54:17 PM

But I don't think this is possible. KFold object has no attribute .save( ). Of course, I can save separately each training/validation data set which are all SFrames but what about the KFold object itself? Do I miss something?


User 19 | 2/23/2016, 8:50:30 PM

Hi theod,

You are correct: the current KFold object has no .save() method. I suggest iterating through the folds and save each train/validation set separately. Then you can pass this list of splits into your model parameter search.

` for (train, valid) in folds: train.save('somewhere') valid.save('somewhere')

folds = [(train1, valid1), ..., (train5, valid5)] j = gl.modelparametersearch.create(folds, gl.foobar.create, params) `

Please let me know if that helps, Chris


User 3147 | 2/23/2016, 9:33:03 PM

Thank you Chris. I follow your suggestion as a work-around.