Graphlab can't find file in S3 bucket

User 2263 | 10/15/2015, 6:35:53 PM

Hi Nice People,

I'm using as EC2 m3.xlarge to work interactively with Graphlab create 1.6.1.

I can easily load SFrames with files located in public S3 buckets, however when I try to load files located in a private S3 bucket I get:

`python /usr/local/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in readcsvimpl(cls, url, delimiter, header, errorbadlines, commentchar, escapechar, doublequote, quotechar, skipinitialspace, columntypehints, navalues, lineterminator, usecols, nrows, skiprows, verbose, storeerrors, **kwargs) 1096 glconnect.getclient().setlogprogress(False) 1097 with cythoncontext(): -> 1098 errors = proxy.loadfromcsvs(internalurl, parsingconfig, typehints) 1099 except Exception as e: 1100 if type(e) == RuntimeError and "CSV parsing cancelled" in e.message:

/usr/local/lib/python2.7/site-packages/graphlab/cython/context.pyc in exit(self, exctype, excvalue, traceback) 47 if not self.showcythontrace: 48 # To hide cython trace, we re-raise from here ---> 49 raise exctype(excvalue) 50 else: 51 # To show the full trace, we do nothing and let exception propagate

RuntimeError: Runtime Exception. No files corresponding to the specified path (s3://my-bucket/list-data/allfiles000001fullout.csv). `

On the other hand if I list the contents of the same bucket in the Linux shell of the same machine: ` aws s3 ls --recursive s3://my-bucket/list-data

2015-10-15 16:52:11 0 list-data/ 2015-10-15 16:52:12 406097767 list-data/allfiles000001fullout.csv `

As the machine can see the files this is not a problem of setting the AWS credentials. Any clues?

Thanks

Comments

User 2263 | 10/15/2015, 8:00:45 PM

As an update, the files in the S3 bucket can not be accessed from GraphLab running in EC2 instances, but can be accessed from Graphlab running on local machines.

Is this a bug on your side of things?

Thanks


User 1178 | 10/16/2015, 6:24:25 PM

Hi Vigel,

Which operating system are you using in EC2? GraphLab Create uses openssl library to communicate with S3 . Due to a particular mix of openssl versions and the certificate store versions in the operating system, openssl may not be able correctly validate the amazon certificate.

You may try to validate that using the following command:

  openssl s_client -connect s3.amazonaws.com:443 

Here are the links to describe the certificate issue if you are interested:

 http://curl.haxx.se/mail/archive-2014-10/0066.html  
 https://forums.aws.amazon.com/thread.jspa?threadID=164095

We know that Centos operating system has this particular problem and didn't see the issue in Ubuntu. So if you happen to use Centos AMI image, I would suggest you try out other AMI images.

Thanks! Ping


User 2263 | 10/19/2015, 7:55:47 PM

Hi Ping,

Thanks for your response.

I'm using the Amazon Linux AMI, which seems to be Fedora based.

NAME="Amazon Linux AMI" VERSION="2015.09" ID="amzn" ID_LIKE="rhel fedora" VERSION_ID="2015.09" PRETTY_NAME="Amazon Linux AMI 2015.09" ANSI_COLOR="0;33" CPE_NAME="cpe:/o:amazon:linux:2015.09:ga" HOME_URL="http://aws.amazon.com/amazon-linux-ami/"

The openssl fix didn't solve the issue, and so I'm still not being able of accessing a private S3 bucket from within Graphlab. Any further suggestion to try?

Thanks in advance.


User 1178 | 10/21/2015, 4:59:15 PM

Hi,

We have found that latest ubuntu image in Amazon does not have this SSL issue. Please consider using that image. If you have to use the image your are using, you may consider asking the question in Amazon developer forum.

In the mean time, we are modifying GraphLab Create to use alternative certificate file when talking to S3, the fix would be in next release.

Thanks! Ping


User 2263 | 10/21/2015, 6:18:17 PM

Hi Ping,

Thanks for your suggestion, the Ubuntu machine works nicely. However, how can I control what image is being deployed when configuring machines by using the Graphlab API? In other words, it seems there is no argument for controlling the image type when you call: gl.deploy.Ec2Config() Thanks!


User 1178 | 10/23/2015, 3:05:07 PM

Hi,

All GraphLab Create deployment API uses ubuntu image as base image, so you will be fine. Are you planning to use Ec2Cluster API or using Predictive Service?

Thanks!

Ping


User 2263 | 10/26/2015, 8:29:18 PM

Hi Ping,

For now I'll be using just the Ec2Cluster API.

Thank you!