Hello world does not work on Dato Distributed! (exitStatus=21, diagnostics=Exception from containe)

User 11 | 12/10/2015, 6:01:32 PM

Hi,

I am new to GraphLab Create. I followed the installation guide on https://dato.com/learn/userguide/deployment/pipeline-hadoop-setup.html. However, when I run the sample Hello World code, I get an error "exitStatus=21, diagnostics=Exception from container-launch.". The complete task log is attached.

This is my Hadoop 2.7,1 cluster. It has 3 machines, one of them acts as a master. I submit the job from the master machine (<myIP-38>). All machines run Ubuntu 14.04

This is the code I ran (copied from the installation guide): ` import graphlab as gl

Create cluster

c = gl.deploy.hadoopcluster.create( name=’test-cluster’, datodistpath='hdfs://<myIP-38>:8020/user/name/dd', hadoopconf_dir='~/yarn-config')

def echo(input): return input

j = gl.deploy.job.create(echo, environment=c, input='hello world!') `

Thanks, -Khaled

` 15/12/10 11:34:36 INFO applications.ApplicationMaster: Initializing ApplicationMaster Application master for app, appId=4, clustertimestamp=1449707595238, attemptId=1

GRAPHLAB VALS datoDistribInstallhdfs://<myIP-38>:8020/user/name/dd jobWorkingDir=hdfs://<myIP-38>:8020/user/kammar/dato_distributed/jobs/echo-Dec-10-2015-11-34-17


ations.ApplicationMaster: Got container status for containerID=container1449707595238000401000003, state=COMPLETE, exitStatus=21, diagnostics=Exception from container-launch. Container id: container1449707595238000401000003 Exit code: 21 Stack trace: ExitCodeException exitCode=21: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 21 `

Comments

User 11 | 12/10/2015, 7:11:39 PM

I also tried to run another code, but got the same error:

def add(x, y): return x + y

job = gl.deploy.job.create(add, environment=c, x=1, y=2)


User 15 | 12/10/2015, 8:50:56 PM

Hi,

Did you replace your user name and servername in the datodistpath, or did you submit it as "user/name/dd" verbatim? Those values are meant to be replaced with the correct values for your environment. It's not clear from your output that you changed the values.

Evan


User 11 | 12/10/2015, 10:10:20 PM

Thank you Evan for your reply,

I installed graphLab using ./setup_dato-distributed.sh -d hdfs://<myIP>:8020/user/kammar/dd -k ../productCode.ini -c ~/hadoop-2.7.1/etc/hadoop

I can find the files in my hdfs. I substitute "user" with "kammar" when I run the code.


User 15 | 12/10/2015, 10:34:25 PM

There should be a server name (the name or IP of the master node) preceding ":8020" I think. Could you try it with that?


User 11 | 12/10/2015, 10:41:14 PM

I actually use the master ip. I think it does not appear because I was writing it like \<myIP\>, without escaping < and >.

In summary I use: datodistpath='hdfs://\<myIP\>:8020/user/name/dd',


User 11 | 12/14/2015, 9:50:15 AM

Hi,

Is there any update about this? It looks like my installation does not work, but I am not sure where is the problem. Jobs are submitted successfully, but fail to execute!

Thanks, -Khaled


User 1190 | 12/14/2015, 8:44:54 PM

Hi kammar,

Can you check the output of yarn logs -applicationId APPID. You can find the APPID in the printout of gl.deploy.job.create

Thanks, -jay


User 11 | 12/15/2015, 12:36:22 AM

Thank you Jay,

I got this output:

15/12/14 20:24:52 INFO client.RMProxy: Connecting to ResourceManager at /129.97.171.38:8032 /tmp/logs/kammar/logs/application14497075952380016 does not exist. Log aggregation has not completed or is not enabled.

I used the default log directory during my installation.

Any hints?


User 1190 | 12/15/2015, 7:51:33 AM

Did you try digging for the yarn logs immediately after the job failed? If you type yarn application -list are you able to find the submitted dato distributed application?

If the issue is due to log aggregation not enabled you can following the steps in this link https://amalgjose.wordpress.com/2015/08/01/enabling-log-aggregation-in-yarn/ to enable log aggregation.


User 11 | 12/15/2015, 8:03:27 AM

Thank you Jay for your response,

I can see the dato distributed applications in the yarn UI. Is log aggregation a required feature for Dato distributed applications?


User 1190 | 12/15/2015, 9:53:21 PM

I'm not certain if log aggregation is a hard requirement, but in the case of application failure, we need logs to understand what has happened.


User 11 | 12/18/2015, 3:27:25 PM

Hi,

It turns out the problem is related to the License. This is the error I found in the aggregated logs. I thought this is related to the known bug (http://forum.dato.com/discussion/1076/urgent-license-check-failed-unable-to-validate-license). I added "export GRAPHLABPRODUCTKEY=<your product key here>" in all ~/.bashrc files.

This is the error I have:

command is dato/bins/pipeline/workerapp.py --workeridentifier container1450450708517000101000004 --workerhostname ip-210 --commanderhostname ip-132 --portstart 9100 --portend 9200 --jobworkingdir hdfs://ip-131:8020/user/kammar/datodistributed/jobs/echo-Dec-18-2015-09-59-38 [ERROR] License check failed: Unable to validate product key. Contact support@dato.com. Traceback (most recent call last): File "dato/bins/pipeline/workerapp.py", line 8, in <module> from common import launchflaskapp File "/tmp/hadoop-kammar/nm-local-dir/usercache/kammar/appcache/application14504507085170001/filecache/15/bins/pipeline/common.py", line 10, in <module> import graphlab as gl File "/tmp/tmp.5T6s8fR1fhGL_REF/datoconda/lib/python2.7/site-packages/graphlab/init.py", line 76, in <module> import graphlab.toolkits.graphanalytics as graphanalytics File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/toolkits/graphanalytics/init.py", line 155, in <module> import pagerank File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/toolkits/graphanalytics/pagerank.py", line 12, in <module> from graphlab.toolkits.distributed import run as distributed_run File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/toolkits/distributed.py", line 15, in <module> from graphlab.deploy.datodistributed.pipeline.dml import dml as dml File "/tmp/tmp.5T6s8fR1fhGL_REF/datoconda/lib/python2.7/site-packages/graphlab/deploy/init.py", line 26, in <module> defaultsession = session.open() File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/deploy/session.py", line 582, in open return Session(location) File "/tmp/tmp.5T6s8fR1fh__GLREF/datoconda/lib/python2.7/site-packages/graphlab/deploy/session.py", line 112, in init self.location = maketempfilename(prefix='tmpsession') File "/tmp/tmp.5T6s8fR1fhGL_REF/datoconda/lib/python2.7/site-packages/graphlab/util/init.py", line 699, in maketempfilename templocation = gettempfilelocation() File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/util/init.py", line 672, in gettempfilelocation unity = glconnect.getunity() File "/tmp/tmp.5T6s8fR1fh__GLREF/datoconda/lib/python2.7/site-packages/graphlab/connect/main.py", line 308, in getunity assert isconnected(), ENGINESTARTERROR_MESSAGE AssertionError: Cannot connect to GraphLab Create engine. Contact support@dato.com for help.

real 0m3.563s user 0m1.490s sys 0m0.324s Error executing control script End of LogType:gl_worker.stdout


User 1190 | 12/18/2015, 6:33:33 PM

Hi Kammar,

Does it work after you have set the bashrc?


User 11 | 12/18/2015, 7:03:14 PM


User 1178 | 12/18/2015, 8:23:13 PM

Hi Kamma,

How did you get the productCode.ini file?Did you downloaded from http://dato.com website at the same time you downed the Dato Distributed? If you check the content of the productCode.ini file, you should see it in the following structure:

[Product] product_key = <your-key> license_info = <your-licence-info> There is a possibility this file is ether corrupted or is not in the right format. You may want to send your file to contact@dato.com so that we can validate that.

Thanks!

Ping


User 11 | 12/18/2015, 8:27:56 PM

Thank you Ping, The file is not in the right format.

I know my product key, but how can I get my license_info, or how can I download my ini file again?

Thanks, -Khaled


User 1178 | 12/18/2015, 10:04:18 PM

Hi kammar,

Our support is emailing you the license now. You should get it shortly.

Thanks!

Ping