Error while retrieving results from a job that run on EC2

User 2059 | 6/17/2015, 2:35:57 PM

Hi,

I'm trying to run a simple job on EC2. I use the following script:

import graphlab as gl def add(x, y): return x + y ec2 = gl.deploy.environment.EC2(name='addition', s3_folder_path=my_test_folder, aws_access_key=my_aws_access_key, aws_secret_key=my_aws_secret_key, region='eu-west-1', tags=my_tags) job = gl.deploy.job.create(add, x = 1, y = 1, environment = ec2) print 'Job finished with results: ' + str(job.get_results())

I see that an EC2 instance starts and terminates as expected and that job is scheduled. Although, there is an error risen on job.get_results():

Traceback (most recent call last): File "code/ec2.py", line 20, in <module> print 'Job finished with results: ' + str(job.get_results()) File "/Users/aleksandra.wozniak/anaconda/envs/dato-env/lib/python2.7/site-packages/graphlab/deploy/_job.py", line 473, in get_results self._wait_for_job_finish() File "/Users/aleksandra.wozniak/anaconda/envs/dato-env/lib/python2.7/site-packages/graphlab/deploy/_job.py", line 603, in _wait_for_job_finish while not self._job_finished(): File "/Users/aleksandra.wozniak/anaconda/envs/dato-env/lib/python2.7/site-packages/graphlab/deploy/_job.py", line 597, in _job_finished return self._is_final_state(self.get_status(_silent = True)) File "/Users/aleksandra.wozniak/anaconda/envs/dato-env/lib/python2.7/site-packages/graphlab/deploy/_job.py", line 347, in get_status self._finalize() File "/Users/aleksandra.wozniak/anaconda/envs/dato-env/lib/python2.7/site-packages/graphlab/deploy/_job.py", line 830, in _finalize self._metrics[-1]['status'] == 'Failed': IndexError: list index out of range

Moreover, I cannot access logs of this job from GraphLab Canvas (there is no link to the log file). I'm using GraphLab Create v. 1.4.1.

Can someone help me to debug this? Thank you!

Comments

User 16 | 6/17/2015, 8:38:32 PM

I just tried your code and it worked for me. The log files your looking for can be found in the s3folderpath.

Thanks for the stack trace. Examining that code, this seems to be caused by an eventual consistency issue with S3. The problem you encountered should be pretty rare (I've never heard of it happening before). If you try it again, I would expect it to work.

I'll make sure we include a fix for this issue in our next release.