GraphLab Create Deployment

User 759 | 9/23/2014, 5:44:38 PM

Hello,

I had some difficulty running the end-to-end recommender example in the User Guide. I had to add a new input for the recommenditems function recommend.setinputs({'data': ('clean', 'cleaned')}) and the users parameter should be recommendations = task.inputs['model'].recommend(users=data['user']) not users=data['uid']

Now that I've run the job, how do I get the output? I tried grabbing the 'recommend' task and calling get_outputs() but the 'recommendations' output is None.

Comments

User 91 | 9/26/2014, 7:18:12 AM

Can you paste the code snippet that you were working with?


User 759 | 9/26/2014, 6:50:04 PM

Hey,

I'm just running the tutorial code plus a few extra lines to get it to work. I should mention that I want to run this outside of an interactive IPython shell. I also tried adding this to the end of the file

while recommend.getoutputs()['recommendations'] is None: pass recommend.getoutputs()['recommendations'].save('recommender.model')

but it didn't work. Also, is there anyway to rerun the job without deleting it and creating it again?


User 10 | 9/26/2014, 8:43:07 PM

Hey Dillon -

Thanks for trying out GraphLab Data Pipelines. You bring up a good point, the User Guide doesn't make it clear how to get the output of the example. I will update that in the User Guide.

<h2>Getting the Output</h2>

To get the SFrame for the <code class="CodeInline">recommendations</code> output in the <code class="CodeInline">recommend</code> task, that output needs to be bound to a path. This way the Data Pipelines will write out the SFrame after completing the Task.

Unless Task outputs are bound to a path, the framework won't write them (this is beneficial if the output is intermediate and is consumed by a subsequent task - no need to spend the time persisting it).

For specifying a path to persist the recommendations, here is how the example would be changed:

<pre class="CodeBlock"><code>taskswithbindings = [ ('clean', {'csv':'https://s3.amazonaws.com/GraphLab-Datasets/movie_ratings/sample.large'}), ('train', {}), # no bindings needed ('recommend', {'recommendations':'/tmp/recommendation-results'})]

local = gl.deploy.environment.LocalAsync('local') # named environment joblocal = gl.deploy.job.create( taskwithbindings, name='local-exec', environment=local) joblocal.save()</code></pre>

Now, to load the stored recommendations: <pre class="CodeBlock"><code>results = graphlab.SFrame('/tmp/recommendation-results') </code></pre>

<h2>Rerunning the Job</h2> The way to think about <code class="CodeInline">graphlab.deploy.job.create</code> API is that it initiates execution of a set of tasks and creates a <code class="CodeInline">Job</code> object for that execution.

To enable asynchronous execution to be initiated from a Python session and then the Python session terminated while the execution continues, GraphLab Create maintains a session of Jobs/Tasks. This is why each Job requires a unique name.

So, to run the same set of tasks with the same bindings, the simplest way to avoid specifying a new name on each invocation is to not specify a name at all, and the framework will assign a name using the Task names and a timestamp. This way the Jobs have unique names, and makes it easy to run the same set of Tasks repeatedly while debugging.

<pre class="CodeBlock"><code>jobuniquename = gl.deploy.job.create(taskwithbindings, environment=local)</code></pre>

Please continue to send feedback about using GraphLab Data Pipelines.


User 759 | 9/29/2014, 3:02:43 PM

That makes sense, thank you very much!