Error when running 'hello world' on GraphLab distributed

User 3383 | 3/9/2016, 5:22:37 PM

I followed instruction guide on https://dato.com/learn/userguide/deployment/pipeline-hadoop-setup.html and set up my GraphLab distributed on Hadoop cluster. But I got an error "Diagnostics: {"numFailedContainers":1,"numTotalContainers":4,"exit_status":21,"error":"Execution script exited in error","numCompletedContainers":1}".

My hadoop cluster has 4 machines and one of them act as master. I submit the job from master machine. This is the code:

import graphlab as gl
c = gl.deploy.hadoop_cluster.create(
name='test-cluser',
dato_dist_path='hdfs://hadoopmaster:9000/usr/graphlab',
hadoop_conf_dir='/usr/local/hadoop/etc/hadoop')

def echo(input):
    return input

j = gl.deploy.job.create(echo, environment=c, input='hello world!')

j.get_results()

I noticed that there are a lot of error message in containers' log like this:

2016-03-09T17:05:40,PIPELINE,ERROR,Failed to get  hdfs://hadoopmaster:9000/user/hduser/dato_distributed/jobs/echo-Mar-09-2016-17-03-40/commander_init.status -> /tmp/pipeline-aYORa4: /bin/sh: 1: hadoop: not found

2016-03-09T17:05:40,PIPELINE,ERROR,Failed to get  hdfs://hadoopmaster:9000/user/hduser/dato_distributed/jobs/echo-Mar-09-2016-17-03-40/commander_init.status -> /tmp/pipeline-f5Yp5O: /bin/sh: 1: hadoop: not found

2016-03-09T17:05:40,PIPELINE,ERROR,Failed to get  hdfs://hadoopmaster:9000/user/hduser/dato_distributed/jobs/echo-Mar-09-2016-17-03-40/commander_init.status -> /tmp/pipeline-Zy4Mxu: /bin/sh: 1: hadoop: not found`

Can you help me deal with it? Thank you very much.

Comments

User 3383 | 3/9/2016, 5:42:23 PM

Files are existing in hdfs, and this command works

hadoop dfs -get hdfs://hadoopmaster:9000/user/hduser/dato_distributed/jobs/echo-Mar-09-2016-17-03-40/commander_init.status

User 17 | 3/9/2016, 7:31:05 PM

Hey there, Sorry you're running into problems. It looks like the hadoop command may not be available on all nodes to all users. (or at least, the user yarn). Are you able to run this command (the hadoop dfs example you provided) on all nodes in your cluster?

Also, can you verify the hadoop command is available to whatever user yarn is running as? (usually this user is yarn).


User 3383 | 3/10/2016, 2:33:53 AM

Yes this command is available in all nodes by hadoop user.


User 3383 | 3/10/2016, 6:17:33 AM

I figured this problem out but I met another error. Here is the error log in worker_flask.py.log

Traceback (most recent call last):
  File "/tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1457580897907_0007/filecache/13/bins/pipeline/worker_flask.py", line 90, in run
args['task_index'])
  File "/tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1457580897907_0007/filecache/13/bins/pipeline/worker_flask.py", line 166, in unwrap_task
    raise ValueError(message)
ValueError: Could not deserialize job., error: 'module' object has no attribute '_builtin_type'

Can you help me deal with it? Thank you very much.