Hello world does not work on Dato Distributed! (exitStatus=21, diagnostics=Exception from containe)

I am new to GraphLab Create. I followed the installation guide on https://dato.com/learn/userguide/deployment/pipeline-hadoop-setup.html. However, when I run the sample Hello World code, I get an error "exitStatus=21, diagnostics=Exception from container-launch.". The complete task log is attached.

This is my Hadoop 2.7,1 cluster. It has 3 machines, one of them acts as a master. I submit the job from the master machine (<myIP-38>). All machines run Ubuntu 14.04

This is the code I ran (copied from the installation guide): ` import graphlab as gl

Create cluster

c = gl.deploy.hadoopcluster.create( name=’test-cluster’, datodistpath='hdfs://<myIP-38>:8020/user/name/dd', hadoopconf_dir='~/yarn-config')

def echo(input): return input

j = gl.deploy.job.create(echo, environment=c, input='hello world!') `

Thanks, -Khaled

` 15/12/10 11:34:36 INFO applications.ApplicationMaster: Initializing ApplicationMaster Application master for app, appId=4, clustertimestamp=1449707595238, attemptId=1

GRAPHLAB VALS datoDistribInstallhdfs://<myIP-38>:8020/user/name/dd jobWorkingDir=hdfs://<myIP-38>:8020/user/kammar/dato_distributed/jobs/echo-Dec-10-2015-11-34-17

ations.ApplicationMaster: Got container status for containerID=container1449707595238000401000003, state=COMPLETE, exitStatus=21, diagnostics=Exception from container-launch. Container id: container1449707595238000401000003 Exit code: 21 Stack trace: ExitCodeException exitCode=21: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 21 `


I also tried to run another code, but got the same error:

def add(x, y): return x + y

job = gl.deploy.job.create(add, environment=c, x=1, y=2)

Did you replace your user name and servername in the datodistpath, or did you submit it as "user/name/dd" verbatim? Those values are meant to be replaced with the correct values for your environment. It's not clear from your output that you changed the values.


Thank you Evan for your reply,

I installed graphLab using ./setup_dato-distributed.sh -d hdfs://<myIP>:8020/user/kammar/dd -k ../productCode.ini -c ~/hadoop-2.7.1/etc/hadoop

I can find the files in my hdfs. I substitute "user" with "kammar" when I run the code.

There should be a server name (the name or IP of the master node) preceding ":8020" I think. Could you try it with that?

I actually use the master ip. I think it does not appear because I was writing it like \<myIP\>, without escaping < and >.

In summary I use: datodistpath='hdfs://\<myIP\>:8020/user/name/dd',

Is there any update about this? It looks like my installation does not work, but I am not sure where is the problem. Jobs are submitted successfully, but fail to execute!

Thanks, -Khaled

Hi kammar,

Can you check the output of yarn logs -applicationId APPID. You can find the APPID in the printout of gl.deploy.job.create

Thanks, -jay

Thank you Jay,

I got this output:

15/12/14 20:24:52 INFO client.RMProxy: Connecting to ResourceManager at / /tmp/logs/kammar/logs/application14497075952380016 does not exist. Log aggregation has not completed or is not enabled.

I used the default log directory during my installation.

Any hints?

Did you try digging for the yarn logs immediately after the job failed? If you type yarn application -list are you able to find the submitted dato distributed application?

If the issue is due to log aggregation not enabled you can following the steps in this link https://amalgjose.wordpress.com/2015/08/01/enabling-log-aggregation-in-yarn/ to enable log aggregation.

Thank you Jay for your response,

I can see the dato distributed applications in the yarn UI. Is log aggregation a required feature for Dato distributed applications?

I'm not certain if log aggregation is a hard requirement, but in the case of application failure, we need logs to understand what has happened.

It turns out the problem is related to the License. This is the error I found in the aggregated logs. I thought this is related to the known bug (http://forum.dato.com/discussion/1076/urgent-license-check-failed-unable-to-validate-license). I added "export GRAPHLABPRODUCTKEY=<your product key here>" in all ~/.bashrc files.

This is the error I have:

command is dato/bins/pipeline/workerapp.py --workeridentifier container1450450708517000101000004 --workerhostname ip-210 --commanderhostname ip-132 --portstart 9100 --portend 9200 --jobworkingdir hdfs://ip-131:8020/user/kammar/datodistributed/jobs/echo-Dec-18-2015-09-59-38 [ERROR] License check failed: Unable to validate product key. Contact support@dato.com. Traceback (most recent call last): File "dato/bins/pipeline/workerapp.py", line 8, in <module> from common import launchflaskapp File "/tmp/hadoop-kammar/nm-local-dir/usercache/kammar/appcache/application14504507085170001/filecache/15/bins/pipeline/common.py", line 10, in <module> import graphlab as gl File "/tmp/tmp.5T6s8fR1fhGL_REF/datoconda/lib/python2.7/site-packages/graphlab/init.py", line 76, in <module> import graphlab.toolkits.graphanalytics as graphanalytics File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/toolkits/graphanalytics/init.py", line 155, in <module> import pagerank File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/toolkits/graphanalytics/pagerank.py", line 12, in <module> from graphlab.toolkits.distributed import run as distributed_run File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/toolkits/distributed.py", line 15, in <module> from graphlab.deploy.datodistributed.pipeline.dml import dml as dml File "/tmp/tmp.5T6s8fR1fhGL_REF/datoconda/lib/python2.7/site-packages/graphlab/deploy/init.py", line 26, in <module> defaultsession = session.open() File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/deploy/session.py", line 582, in open return Session(location) File "/tmp/tmp.5T6s8fR1fh__GLREF/datoconda/lib/python2.7/site-packages/graphlab/deploy/session.py", line 112, in init self.location = maketempfilename(prefix='tmpsession') File "/tmp/tmp.5T6s8fR1fhGL_REF/datoconda/lib/python2.7/site-packages/graphlab/util/init.py", line 699, in maketempfilename templocation = gettempfilelocation() File "/tmp/tmp.5T6s8fR1fhGLREF/datoconda/lib/python2.7/site-packages/graphlab/util/init.py", line 672, in gettempfilelocation unity = glconnect.getunity() File "/tmp/tmp.5T6s8fR1fh__GLREF/datoconda/lib/python2.7/site-packages/graphlab/connect/main.py", line 308, in getunity assert isconnected(), ENGINESTARTERROR_MESSAGE AssertionError: Cannot connect to GraphLab Create engine. Contact support@dato.com for help.

real 0m3.563s user 0m1.490s sys 0m0.324s Error executing control script End of LogType:gl_worker.stdout

Hi Kammar,

Does it work after you have set the bashrc?

Hi Kamma,

How did you get the productCode.ini file?Did you downloaded from http://dato.com website at the same time you downed the Dato Distributed? If you check the content of the productCode.ini file, you should see it in the following structure:

[Product] product_key = <your-key> license_info = <your-licence-info> There is a possibility this file is ether corrupted or is not in the right format. You may want to send your file to contact@dato.com so that we can validate that.



Thank you Ping, The file is not in the right format.

I know my product key, but how can I get my license_info, or how can I download my ini file again?

Thanks, -Khaled

Hi kammar,

Our support is emailing you the license now. You should get it shortly.