Can not parse data on EC2 instance

User 324 | 8/19/2014, 3:56:50 PM

Hello!

I ve tried to run a SVM on c3.xlarge with 1.36GB dataset. It can download the data from S3, but always stops while parsing the data. I ve tried the same dataset on a local machine and it works fine. Am I missing something ?

Kind regards, Alex

PS: Here is the link

data = graphlab.SFrame.readcsv('s3://ampcamp-arigge/adclickwithoutid.csv', columntype_hints={"Label":int,"I1":int,"I2":int,"I3":int,"I4":int,"I5":int,"I6":int,"I7":int,"I8":int,"I9":int,"I10":int,"I11":int,"I12":int,"I13":int})

Comments

User 14 | 8/19/2014, 8:40:25 PM

Hi Alex,

Can you tell us which version of GraphLab Create are you using?

Thanks, jay


User 324 | 8/20/2014, 11:58:46 AM

I use v.0.9


User 16 | 8/20/2014, 9:07:19 PM

Hi Alex,

I'm sorry you're having an issue with GraphLab Create.

When it stops during parsing, do you get an error? If so, what is the error? Also would it be possible for us to have access to the data set you're using?

Or do you not get an error, but the parsing just never seem to finish? If this is the case, maybe the EC2 host you launched and the S3 bucket are in different AWS regions. Transferring data across regions can be slow. By default when you launch an EC2 instance the host runs in 'us-west-2' (i.e. Oregon). I would recommend launching your EC2 host in the same region as your S3 bucket.

Thanks, Toby


User 941 | 11/13/2014, 8:00:18 AM

Hi, i expirience the same problem. Parsing never stop. Concerning "maybe the EC2 host you launched and the S3 bucket are in different AWS regions". S3 <b class="Bold">does not require</b> region selection.