Hadoop & Graphlab

User 1952 | 5/23/2015, 12:12:32 PM


Is there any special configuration needed for hadoop to run graphlab?

I'm trying to run the following code under cloudera distribution hadoop VM (CDH 5.4 VM) but I get "pending" for job status:

import graphlab as gl

def add(x, y): return x + y

hd = gl.deploy.environment.Hadoop('hd', config_dir='/etc/hadoop/conf')

job = gl.deploy.job.create(add, x = 1, y = 1, environment = hd)

I don't know if there is any configuration needed because I'm running this code, as said in the user guide without any special configuration.


User 1958 | 5/24/2015, 7:08:08 AM

Hi Andrew, This implies that the job has not been scheduled for execution. It takes some time to spin up the execution environment for the jobs. Once it is done a state transition will occur. All the states associated with the job life cycle can be found at the following link


Thanks sethu

User 1952 | 5/25/2015, 9:02:53 PM

Hi @seth

Actually it remains pending no matter how much I wait for it! I repeated the procedure on hortonworks sandbox and this time I got failed for status. I read through the log file but couldn't figure out the problem. I attached the log file and I would be so grateful if somebody can help me on this issue since I really need to run graphlab on hadoop for and academic project and right now I'm stuck and time is against me! :(

User 1178 | 5/25/2015, 11:09:22 PM

Hi Andrew,

In order to run any GraphLab Create job in Hadoop, we require python 2.7 and virtualenv to be available in all the nodes in hadoop. From your log file, I can see that you do not have virtualenv setup in your nodes. Can you check your Hadoop node configuration?



User 1952 | 5/26/2015, 7:52:29 PM

Hi @"Ping Wang"

I've created a virtualenv on my single hadoop node (hortonworks sandbox VM) which includes python2.7 and graphlab. I execute the source file with virtualenv activated on shell. Is this enough or I should do additional configurations?

User 1178 | 5/26/2015, 9:08:38 PM

Hi Andrew,

You do not really need to create an virtualenv, as long as you have virtualenv installed, you are fine.

The error you saw before is caused by 'virtualenv' package not available in your 'root' Python environment.