Uplading and reading large files

User 3092 | 1/21/2016, 7:03:18 AM

I am doing a Machine Learning course on Coursera using the Amazon EC2 in graphlab-create-1.7.1-coursera (ami-182fef7b).

I tried to upload a large file (amazon_baby.gl.zip) and read it through the Graphlab function SFrame. The file is around 40 Mb.

To upload I followed the directives in https://dato.com/download/install-graphlab-create-aws-coursera.html

  1. I tried to Select Upload on http://public ip address:8888 page, select Upload. Select your file. It failed due to file size

  2. I tried to use the Option 2. I could upload the zip file in S3 and used the following command

import graphlab sf = graphlab.SFrame('https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazon_baby.gl.zip')

got an error probably because SFrame does not read zip

PROGRESS: Downloading https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl.zip/dirarchive.ini to /var/tmp/graphlab-root/1207/aa261f83-83fd-46c8-afcf-e67ed40f16f4.ini PROGRESS: Failed to download https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl.zip/dirarchive.ini: HTTP response code said error

  1. I created a folder amazonbaby.gl in S3 and individual files to the folder and used the follwing code import graphlab sf = graphlab.SFrame('https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl')

Getting Error

PROGRESS: Downloading https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl/mbfaa91c17752f745.frameidx to /var/tmp/graphlab-root/1207/16aefc6d-ab74-4341-8a2b-59dfe71ae860.frameidx PROGRESS: Failed to download https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl/mbfaa91c17752f745.frame_idx: HTTP response code said error

import graphlab sf = graphlab.SFrame('https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazon_baby.gl/')

Getting the error

ROGRESS: Downloading https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl//dirarchive.ini to /var/tmp/graphlab-root/1207/8d949e1e-807a-4b37-8a82-30045ce0f6ca.ini PROGRESS: Failed to download https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl//dirarchive.ini: HTTP response code said error PROGRESS: Downloading https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl/ to /var/tmp/graphlab-root/1207/1471578e-251c-41d9-9fec-75278c2ce593 PROGRESS: Failed to download https://s3-ap-southeast-1.amazonaws.com/courserasuman/amazonbaby.gl/: HTTP response code said error

Comments

User 2854 | 1/21/2016, 11:51:28 AM

@sumans345 - try looking at GL Create API doc, for the supported data formats for SFrame()? here


User 3092 | 1/22/2016, 2:50:04 AM

Hi CesarO,

Thanks but I could read the file of same format which Coursera provided during the earlier lesson. The only difference was the file being small I could upload it directly to EC2. Also my fellow students who are not using EC2 but their own machines could read the same file through SFrame function. There are basically three steps: i) Upload the file to EC2 ii) Unzip the file iii) Read the file using SFrame

I am messing up somewhere


User 1189 | 1/25/2016, 6:25:59 PM

Hi,

zip format is not supported for direct reading. You will want to extract the contents of the zip file and use that.

Yucheng


User 5258 | 6/11/2016, 10:15:05 AM

My question is How does one extract? I downloaded the uploaded the .zip file in Graphlab ( 'jypyter').

Could not find any way to unzip it.

I also did other way round. First unzipped the file in PC and then uploaded all unziped compoents. That does not work too