Job Output

User 912 | 2/3/2015, 9:07:45 PM

Hello,

It's me again! Could someone please explain how to get the results from a job (not just a print of them)? Because I think I am doing it wrong.. I understand that a task can have only a graphlab data structure as an output (result) - such as SArray, SFrame or model. In my case whatever my task function does, in the end it is stored into an SArray. I also specify the outputs for the task, but then after the job is completed, I can't open the supposedly created SArray, because it doesn't exist.

I've tried running the task in a job with just printing the SArray and it appears in the log, so it is created, but somehow it's not stored in the output destination..

in my function I have:

(...) sa = graphlab.SArray(data = [d]) task.outputs['numbers'] = sa

and then for the task and job I have:

temptask.setoutputs({'numbers': '~/Desktop/UNI/results'}) my_job = graphlab.deploy.job.create([temp_task])

Thank you in advance.

Comments

User 92 | 2/4/2015, 8:00:02 PM

Hi vyara,

Are you running your job in EC2 instance or locally? If you are running the job in EC2, then the output is usually set as a path to s3 so that you may access the output from other places, including from your client machine that launches the job.

Here is an example regarding how to set output and get job result:

<pre>import graphlab as gl

your own query code that outputs an array

def my_query(task): import graphlab as gl task.outputs['output'] = gl.SArray([1,2,3])

create a task output goes to some s3 path

t = gl.deploy.Task('t') t.setcode(myquery) t.set_outputs({'output':'s3:/your-s3-output-path'})

start the job

gl.aws.set_credentials(your-key, your-credential) env = gl.deploy.environment.EC2('test','s3://you-s3-log-path') j = gl.deploy.job.create(t, environment=env, name='test-job')

when job finishes, get result by getting from the path you have specified

sa = gl.SArray('s3://your-bucket-name/your-output-path')

</pre>

Note that you can also find where the job output goes by printing all tasks the job has:

<pre>j.tasks </pre>

Let me know if you have more questions.


User 912 | 2/4/2015, 9:07:16 PM

Yes, actually this helped a lot, thank you for explaining. I have couple more questions though, I hope you won't mind..

How about when you're running parallel tasks job? Do you have to specify an output for every task (entry in the parameters list)? Because the log says something about no output for each parameter entry in the list.

And also, in the end, this kind of job produces automatically an SFrame, is that correct? Because, like I mentioned before, my function code is supposedly setting the output to a SArray, but the output folder it creates stores an SFrame..

P.S. I am running everything locally. :)

Cheers.


User 92 | 2/4/2015, 10:28:28 PM

Hi vyara,

Here is an example of running in parallel. you are right that we only support output as SFrame for now, so if your output is SArray, you will have to work around by returning an SFrame with one column. Here is an example that works:

<pre># sample query that produce SFrame according to parameter 'count' def myquery(task): import graphlab as gl numitems = task.params['count'] task.outputs['output'] = gl.SFrame({'a':range(num_items)})

create a task output goes to some local path, for running in ec2, you

will want to use s3 path

t = gl.deploy.Task('mytask') t.setcode(myquery) t.setoutputs({'output':'/tmp/my_output'})

parallel jobs, with four input for 'count' parameter

params = gl.SFrame({'count':[1,2,3,4]})

this runs local for now since no env is passed, you can also do

it in EC2 by pass in env created from graphlab.deploy.environment.EC2(...)

pj = gl.deploy.parallelforeach(t, params, 'parallel-job-example')

wait for pj to finish by using pj.get_status()

now get result

sf = gl.SFrame('/tmp/my_output')

this is the sample output

In [137]: sf = gl.SFrame('/tmp/my-output')

In [138]: sf Out[138]: Columns: a int parameters dict

Rows: 10

Data: +---+--------------+ | a | parameters | +---+--------------+ | 0 | {'count': 1} | | 0 | {'count': 2} | | 1 | {'count': 2} | | 0 | {'count': 3} | | 1 | {'count': 3} | | 2 | {'count': 3} | | 0 | {'count': 4} | | 1 | {'count': 4} | | 2 | {'count': 4} | | 3 | {'count': 4} | +---+--------------+ [10 rows x 2 columns]

</pre>