Troubleshooting - Run job on hadoop cluster

User 1637 | 3/25/2015, 3:02:00 PM

Hi all,

I tried to run this following program but my job failed. Does anyone have an idea? <pre class="CodeBlock"><code> import graphlab as gl

def add(x, y): return x + y

Define your Hadoop environment

hd = gl.deploy.environment.Hadoop('test2', configdir='/etc/hadoop/conf.cloudera.yarn/', memorymb=16384) hd.save()

Execute the job.

job = gl.deploy.job.create(add, x = 1, y = 1, environment = hd) print job.get_results()</code></pre>

Please find attached the log file. I saw that virtualenv is not found but i have installed it on edge node and other nodes.

Thanks in advance for your help.

Comments

User 1637 | 3/25/2015, 5:03:33 PM

I tried this notebook too, same issue. http://graphlab.com/learn/gallery/notebooks/datapipelinerecsysintro.html


User 1178 | 3/25/2015, 5:38:04 PM

HI CourbeB,

It seems virtualenv-2.7 does not exist in your Hadoop nodes. Usually it is automatically created when you install virtualenv for python 2.7. For now, the work around is to create a symlink to the actual virtualenv binary:

	ln -s {path/to/virtualenv} <same-path/to/virtualenv-2.7}

Please let me know if that works for you.

We will try to fix this in coming release.

Thanks!

Ping


User 1637 | 3/27/2015, 3:29:38 PM

Thanks for your answer! Here is the new log file. Same issue, do you have any other idea?

Baptiste


User 1637 | 3/31/2015, 10:32:54 AM

Finally, i found where the issue is.

My hadoop cluster works on CentOS. In order to do a safety installation of python 2.7, i used SCL repo. When you do that, python 2.7 is installed in different pythonpath and you have to "enable" python by adding a script in "profile.d". By doing this, each time a bash is launched, you can access to python-2.7 command.

See : http://wiki.centos.org/AdditionalResources/Repositories/SCL http://developerblog.redhat.com/2013/08/08/software-collections-quickstart/

The trouble is Graphlab did not launched any "bash" so it cannot find python-2.7 (and virtualen-2.7) since this command is not enabled.


User 1637 | 3/31/2015, 12:01:47 PM

(Sorry i did not find how to edit my previous post)

When you run Graphlab, "glcreatebase_virtenv.sh" is generated and executed. Nevertheless this script seems to not find python-2.7 and consequently virtualenv-2.7.

Do you have any idea in order to solve this trouble? Or any alternative to launch a job on hadoop cluster?

Thanks in advance for your help!

Baptiste


User 1637 | 4/1/2015, 9:40:11 AM

Hi all,

After many tricks, i manage to launch Graphlab Server finally. But i'm facing a new issue. Here is the log :

[WARNING] Unable to create session in specified location: '/home/.graphlab/artifacts'. Using: '/tmp/graphlab-tmp-session-dFplhx' [INFO] Doing work [INFO] directory=/data.2/yarn/nm/usercache/agerbeaux/appcache/application14274674762490018/containere191427467476249001801000002/topasync.tar.gz/steps/0 [INFO] taskfile=/data.2/yarn/nm/usercache/agerbeaux/appcache/application14274674762490018/containere191427467476249001801000002/topasync.zip/steps/0/0 [INFO] path=/data.2/yarn/nm/usercache/agerbeaux/appcache/application14274674762490018/containere191427467476249001801000002/topasync.zip/steps/0 [INFO] after GLUnpickler [INFO] after load [INFO] attempting to find hadoop core-site.xml at /var/run/cloudera-scm-agent/process/1884-yarn-NODEMANAGER [INFO] Start server at: ipc:///tmp/graphlabserver-48859 - Server binary: /tmp/tmp.bpxmeZIxAd__GLVIRTENV/lib/python2.7/site-packages/graphlab/unityserver - Server log: /var/log/hadoop-yarn/container/application14274674762490018/containere191427467476249001801000002/graphlabserver1427819848.log [INFO] GraphLab Server Version: 1.3.0 [INFO] configs = [{'namenode': 'nameservice1', 'port': 8020}] [INFO] got user agerbeaux [INFO] got glapphdfsurl hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018, [INFO] Running job: add-Mar-31-2015-18-36-53 [INFO] job.execdir=hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018, [INFO] job.runtimetask_paths={Task : add Description : Input(s) : {'y': 1, 'x': 1} Output : None Package(s) : [] Code :

def add(x, y): return x + y : 'hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl'} [INFO] Execution started : add-Mar-31-2015-18-36-53 [INFO] Execution path : hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018, [INFO] Task started : add [INFO] Task completed: add [INFO] called with archfilename /tmp/tmpMwl1fq hdfspath hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl hadoopconfdir None [ERROR] Job execution failed. Traceback (most recent call last) Traceback (most recent call last): File "/tmp/tmp.bpxmeZIxAd__GLVIRTENV/lib/python2.7/site-packages/graphlab/deploy/executionenvironment.py", line 105, in runjob job = self.runjob(job, statuspath=statuspath) File "/tmp/tmp.bpxmeZIxAdGLVIRTENV/lib/python2.7/site-packages/graphlab/deploy/executionenvironment.py", line 135, in runjob savepath=job.runtimetaskpaths[task]) File "/tmp/tmp.bpxmeZIxAdGLVIRTENV/lib/python2.7/site-packages/graphlab/deploy/executionenvironment.py", line 188, in runtask rttask.savetofile(savepath) RuntimeError: Failed to copy /tmp/tmpMwl1fq -> hdfs hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl: put: `hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application1427467476249_0018,/0-0-add.gl': No such file or directory

Error type : Failed to copy /tmp/tmpMwl1fq -> hdfs hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl: put: `hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl': No such file or directory

Error message : Failed to copy /tmp/tmpMwl1fq -> hdfs hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl: put: `hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl': No such file or directory

Failed to copy /tmp/tmpMwl1fq -> hdfs hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-add.gl: put: `hdfs://nameservice1:8020/user/agerbeaux/GraphLabDeploys/application14274674762490018,/0-0-addHTTP/1.1 200 OK Transfer-Encoding: chunked Date: Thu, 21 Jul 2016 23:13:36 GMT Server: Warp/3.2.6 Content-Type: application/json

016A ["37zyefqi2sweveyp","42fn7zeo6v5ui427","66pt5sk2wz2jrbzu","awoljknjigytdyls","cj2lanoogknwopto","cnm3adnh35xmsx3f","ebxs4t2y6xr5izzy","eg5zus2pz72mr7xb","exshwew2w2jv3n7r","hx


User 1178 | 4/1/2015, 8:01:18 PM

Hi Baptiste,

This is a bug in our Hadoop code -- it tries to find the application ID from environment variable LOCALDIR and LOGDIR. If the LOCAL_DIR has more than one paths, the application ID cannot correctly be found.

We will fix it in coming release. For now, you may work around by configuring the local_dir in your YARN configuration to have only one directory.

Let me know if it works for you.

Thanks!

Ping


User 1637 | 4/2/2015, 8:56:18 AM

Thanks for your reply, actually YARN has 3 directories for local_dir configuration since there are 3 physical disks. So I cannot change easly YARN configuration.

The only solution is to make a logical disk with these 3 disks, that is to say a fully reinstallation of the hadoop cluster which is currently impossible.


User 1637 | 4/7/2015, 1:43:03 PM

Hi,

I was wondering when do you plan to publish a new release (with bug fix)? I am very interested in the distributed part and i did not manage to find any workaround for this issue.

Thanks in advance for your reply,

Baptiste


User 1190 | 4/7/2015, 5:42:45 PM


User 1637 | 4/10/2015, 3:25:13 PM

Thanks for your reply! I am going to expect this release :)