Model parameter search in EC2 environment & Training/Validation data sets upload in a S3 Path.

User 3147 | 2/3/2016, 1:21:42 PM

Hi all, I am currently interested to deploy a rather tedious model parameter search in a EC2 environment. I have configured the EC2 cluster correctly and I have already sent the corresponding job for execution, I am wondering though if there is a way to also save the training/validation data sets in a S3 path, prior of executing the task. I guess, it will be much more efficient that way. Probably I am asking something naive, but I can't see how it would be possible to do so. Any help?


User 1190 | 2/4/2016, 4:01:46 AM

Hi, Yes it should be possible to have the model parameter search take s3 paths as arguments. Here's a simplified example:

` trainingset = gl.SFrame('...') valdationset = gl.SFrame('...')

trainingsetpath = 's3://YOURPATH/training.sframe' validationsetpath = 's3://YOURPATH/validation.sframe'

job = modelparametersearch.create((trainingsetpath, validationsetpath) graphlab.linear_regression.create) `

For more details, please to the API doc here

Cheers, -jay

User 3147 | 2/4/2016, 4:24:46 PM

Hi Jay, thank you for your response. I did not expect to be that easy.

User 3147 | 2/9/2016, 9:20:20 PM

@Jay, or everyone who could probably know: And what if I would like to save a fold tuple in S3 which has been created locally by the "graphlab.cross_validation.KFold" method? Is this also possible through a simple IPython command?

User 3147 | 2/10/2016, 3:18:27 PM

Hi all!!

As a further update on my yesterday's comment, is it possible to save locally (in disk) a Kfold tuple which has been created by the "graphlab.cross_validation.KFold"? I guess if this was possible, I could also upload this tuple of SFrame in a 'S3 path', either from a IPython command or the aws console.

I look forward to hearing your suggestions.

User 3147 | 3/2/2016, 2:03:43 PM

Hi all,

As a further update on my last question please refer to this post below:

which in fact solves the problem.