Scheduling GraphLab Create jobs across multiple YARN containers

User 911 | 11/12/2014, 2:53:16 AM


I've got a graphlab.recommender that I'm trying to deploy via a Hadoop environment to my YARN cluster. However, I haven't been able to get the job distributed to containers on other machines. The ResourceManager gets

<blockquote class="Quote">2014-11-12 01:13:05,021 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to launch container container1415754081841000301000001 : $JAVAHOME/bin/java -Xmx10m org.graphlab.hadoop.yarn.applications.ApplicationMaster --containermemory 5100 --containervcores 12 --numcontainers 1 --priority 0 1><LOGDIR>/glAppMaster.stdout 2><LOGDIR>/glAppMaster.stderr </blockquote>

, which makes it seem as though I haven't configured the Hadoop environment correctly. But I don't see any num_containers-like options in the graphlab.deploy.environment.Hadoop object. In what part of the whole stack do I configure it?



User 10 | 11/20/2014, 2:06:12 AM

Hi Wes,

Thanks for trying out GraphLab Data Pipelines on Hadoop! GraphLab Create does not currently distribute Jobs to multiple YARN containers, which is why this is not exposed in the Hadoop Environment object.

I would love to chat with you to understand more about how you would like to see Jobs run in a distributed fashion in YARN clusters. Can you email me at to start that conversation?