Unable to evaluate lambdas. lambda workers did not start

User 1191 | 2/2/2015, 9:55:34 PM

Hello, As a followup to http://forum.dato.com/discussion/comment/2490/#Comment_2490, I created a python virtualenv again from scratch (with the latest python packages and graphlab). However, I'm still getting the same error - except this time there is a little more information in the notebook:

<pre class="CodeBlock"><code>d = DataFrame({'one' : Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}) s = SFrame(d) s["one"].apply ( lambda x : isnan(x))</code></pre> <pre class="CodeBlock"><code> RuntimeError Traceback (most recent call last) /home/firdaus/.virtualenv/local/lib/python2.7/site-packages/IPython/core/formatters.pyc in call(self, obj) 693 typepprinters=self.typeprinters, 694 deferredpprinters=self.deferredprinters) --> 695 printer.pretty(obj) 696 printer.flush() 697 return stream.getvalue()

/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj) 399 if callable(meth): 400 return meth(obj, self, cycle) --> 401 return defaultpprint(obj, self, cycle) 402 finally: 403 self.end_group()

/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/IPython/lib/pretty.pyc in defaultpprint(obj, p, cycle) 519 if safegetattr(klass, 'repr', None) not in baseclassreprs: 520 # A user-provided repr. Find newlines and replace them with p.break() --> 521 reprpprint(obj, p, cycle) 522 return 523 p.begingroup(1, '<')

/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/IPython/lib/pretty.pyc in reprpprint(obj, p, cycle) 701 """A pprint that just redirects to the normal repr function.""" 702 # Find newlines and replace them with p.break() --> 703 output = repr(obj) 704 for idx,outputline in enumerate(output.splitlines()): 705 if idx:

/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/graphlab/data_structures/sarray.pyc in repr(self) 324 ret = "dtype: " + str(self.dtype().name) + "\n" 325 ret = ret + "Rows: " + str(self.size()) + "\n" --> 326 ret = ret + str(self) 327 return ret 328

/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/graphlab/datastructures/sarray.pyc in str(self) 336 headln = str(list(self.head_str(100))) 337 else: --> 338 headln = str(list(self.head(100))) 339 if (self.size() > 100): 340 # cut the last close bracket

/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/graphlab/datastructures/sarray.pyc in head(self, n) 681 [0, 1, 2, 3, 4] 682 """ --> 683 return SArray(proxy=self.proxy.head(n)) 684 685 def vector_slice(self, start, end=None):

cysarray.pyx in graphlab.cython.cysarray.UnitySArrayProxy.head()

cysarray.pyx in graphlab.cython.cysarray.UnitySArrayProxy.head()

RuntimeError: Runtime Exception. Unable to evaluate lambdas. lambda workers did not start</code></pre>

Comments

User 19 | 2/2/2015, 10:37:06 PM

Hi fjanoos,

The following code works for me when using a fresh virtualenv.

<pre> import graphlab
from graphlab import SFrame
from pandas import DataFrame, Series, isnan
from math import isnan
d = DataFrame({'one' : Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}) s = SFrame(d)
s["one"].apply ( lambda x : isnan(x)) </pre>

Can you send us the end of your server log to help debug the issue? The location should be mentioned upon engine startup. For example:

<pre> [INFO] Start server at: ipc:///tmp/graphlabserver-31357 - Server binary: /home/chris/venv12/lib/python2.7/site-packages/graphlab/unityserver - Server log: /tmp/graphlabserver1422916299.log [INFO] GraphLab Server Version: 1.2.1 </pre>


User 1191 | 2/2/2015, 11:17:31 PM

The log is attached.

Thanks.


User 92 | 2/3/2015, 12:25:10 AM

Hi fjanoos,

The error is usually caused by environment setting. Can you try a few things:

  1. Locate where libpython2.7.so is and add to your LBLIBRARYPATH: locate libpython2.7.so export LDLIBRARYPATH=[a directory containing libpython2.7.so]

  2. Try to add your sys.exec_prefix to your PYTHON Path:

Inside ipython:

import sys print sys.exec_prefix

Take the output and add to your PYTHONPATH

export PYTHONPATH=$PYTHONPATH:<the exec_prefix output>

Please let us know the result.

Ping


User 1191 | 2/3/2015, 3:59:35 PM

Hi Ping,

I added the following environment variables <blockquote class="Quote">export LDLIBRARYPATH="/usr/lib/" export PYTHONPATH=$PYTHONPATH:'/home/firdaus/.virtualenv'</blockquote> and restarted ipython notebook.

However, I am still getting the same errors.

Also, I don't know if this helps, but it looks like the pylambda worker processes have started:

<blockquote class="Quote">(.virtualenv)[firdaus@hs: 15:58:46] {~}[43]>ps PID TTY TIME CMD 8734 pts/0 00:00:06 ssh 30334 pts/0 00:00:00 ipython 30541 pts/0 00:00:01 python 30571 pts/0 00:00:01 unityserver 30608 pts/0 00:00:00 pylambdaworker 30609 pts/0 00:00:00 pylambdaworker 30610 pts/0 00:00:00 pylambdaworker 30611 pts/0 00:00:00 pylambdaworker 30612 pts/0 00:00:00 pylambdaworker 30613 pts/0 00:00:00 pylambdaworker 30614 pts/0 00:00:00 pylambdaworker 30615 pts/0 00:00:00 pylambdaworker 30616 pts/0 00:00:00 pylambdaworker 30617 pts/0 00:00:00 pylambdaworker 30620 pts/0 00:00:00 pylambdaworker 30621 pts/0 00:00:00 pylambdaworker 30622 pts/0 00:00:00 pylambdaworker 30623 pts/0 00:00:00 pylambdaworker 30624 pts/0 00:00:00 pylambdaworker 30626 pts/0 00:00:00 pylambda_worker 30925 pts/0 00:00:00 ps</blockquote>


User 1191 | 2/3/2015, 4:00:05 PM

Also, I am running python 2.7.3 -- could that be an issue ?


User 1190 | 2/3/2015, 7:05:01 PM

Hi @fjanoos,

In your virtualenv, can you try running the following command and see if there is any obvious error?

<pre><code> $cd /home/firdaus/.virtualenv/ $. bin/activate $ env GLSYSPATH=python -c 'import sys; print ":".join(sys.path)' ./lib/python2.7/site-packages/graphlab/pylambda_worker ipc:///tmp/test </code></pre>

Thanks, Jay


User 92 | 2/3/2015, 7:07:43 PM

Hi fjanoos,

python 2.7.3 should be fine. Can you print the following in your python session and send to us(in your virtual env)?

<pre>>>> import sys

print sys.path print sys.exec_prefix import os print os.environ['PYTHONPATH'] </pre>


User 1191 | 2/3/2015, 11:58:15 PM

@"Jay Gu"

<code class="CodeInline">(.virtualenv)[firdaus@hs: 23:52:38] {~/.virtualenv}[55]>env GLSYSPATH=python -c 'import sys; print ":".join(sys.path)' ./lib/python2.7/site-packages/graphlab/pylambdaworker ipc:///tmp/test</code> <blockquote class="Quote"> Bound to ipc:///tmp/test Bound to ipc:///tmp/testcontrol Bound to ipc:///tmp/teststatus 1423007565 : INFO: (commserver:109): Server listening on: ipc:///tmp/test 1423007565 : INFO: (commserver:111): Server Control listening on: ipc:///tmp/testcontrol 1423007565 : INFO: (commserver:113): Server status published on: ipc:///tmp/teststatus 1423007565 : INFO: (registerfunction:446): Registering function objectfactorybase::makeobject 1423007565 : INFO: (registerfunction:446): Registering function objectfactorybase::ping 1423007565 : INFO: (registerfunction:446): Registering function objectfactorybase::deleteobject 1423007565 : INFO: (registerfunction:446): Registering function objectfactorybase::getstatuspublishaddress 1423007565 : INFO: (registerfunction:446): Registering function objectfactorybase::getcontroladdress 1423007565 : INFO: (registerfunction:446): Registering function objectfactorybase::syncobjects 1423007565 : INFO: (registerfunction:446): Registering function lambdaevaluatorinterface::makelambda 1423007565 : INFO: (registerfunction:446): Registering function lambdaevaluatorinterface::releaselambda 1423007565 : INFO: (registerfunction:446): Registering function lambdaevaluatorinterface::bulkeval 1423007565 : INFO: (registerfunction:446): Registering function lambdaevaluatorinterface::bulkevaldict 1423007565 : INFO: (registerfunction:446): Registering function graphlambdaevaluatorinterface::evaltripleapply 1423007565 : INFO: (registerfunction:446): Registering function graphlambdaevaluatorinterface::init 1423007565 : INFO: (registerfunction:446): Registering function graphlambdaevaluatorinterface::loadvertexpartition 1423007565 : INFO: (registerfunction:446): Registering function graphlambdaevaluatorinterface::isloaded 1423007565 : INFO: (registerfunction:446): Registering function graphlambdaevaluatorinterface::updatevertexpartition 1423007565 : INFO: (registerfunction:446): Registering function graphlambdaevaluatorinterface::getvertexpartitionexchange 1423007565 : INFO: (registerfunction:446): Registering function graphlambdaevaluator_interface::clear</blockquote>

@wangping

<pre class="CodeBlock"><code>import sys print sys.path print sys.exec_prefix import os print os.environ['PYTHONPATH']</code></pre>

<blockquote class="Quote">['', '/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/pip-1.1-py2.7.egg', '/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/distribute-0.7.3-py2.7.egg', '/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/setuptools-12.0.5-py2.7.egg', '/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/tornado-4.1b2-py2.7-linux-x86_64.egg', '/home/firdaus/projects/python', '/home/firdaus', '/home/firdaus/.virtualenv/lib/python2.7', '/home/firdaus/.virtualenv/lib/python2.7/plat-linux2', '/home/firdaus/.virtualenv/lib/python2.7/lib-tk', '/home/firdaus/.virtualenv/lib/python2.7/lib-old', '/home/firdaus/.virtualenv/lib/python2.7/lib-dynload', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/home/firdaus/.virtualenv/local/lib/python2.7/site-packages', '/home/firdaus/.virtualenv/local/lib/python2.7/site-packages/IPython/extensions', '/home/firdaus/.ipython', '/u/firdaus/projects/python/'] /home/firdaus/.virtualenv /home/firdaus/projects/python:</blockquote>


User 1191 | 2/4/2015, 4:58:12 PM

Hello, I have some progress on this issue. Apparently, the IPython that was starting up was not the same one in the virtualenv ... apparently this is a problem others have also experienced: http://stackoverflow.com/questions/20327621/calling-ipython-from-a-virtualenv

So if I use "hash -r" before running ipython , it sometimes manages to execute - but not always. And the times, it does execute, I get messages like this:

<code class="CodeInline">In [6]: s["one"].apply ( lambda x : isnan(x))</code> <blockquote class="Quote">Out[6]: PROGRESS: Less than 16 successfully started. Using only 2 workers. PROGRESS: All operations will proceed as normal, but lambda operations will not be able to use all available cores. PROGRESS: To help us diagnose this issue, please send the log file to feedback@graphlab.com. PROGRESS: (The location of the log file is printed at the start of the GraphLab server).

dtype: int Rows: 4 [0, 0, 0, 1]</blockquote>

Please find attached the log.


User 1191 | 2/4/2015, 6:37:02 PM

I investigated this a little further, and this is what I found.

Originally I was running my code on a virtual machine with 16-cores and 96GB of RAM. This is the configuration that gave me lambda worker errors.

I then resized the virtual-machine to 4 cores and 8GB ram - and in this case, the code works without any problems.

I then re-created this same thing on a new virtual machine. The SFrame.apply worked on the VM with 4 cores but throws errors when I re-sized it to have 16 cores.


User 1190 | 2/4/2015, 7:43:15 PM

Thank you @fjanoos for the investigation. We will dig into this issue and send you an update soon.


User 1190 | 2/4/2015, 8:05:37 PM

@fjanoos, I tried the lambda apply on a 32core EC2 instance and it works well. So I suspect there could be limitation of the VMs that we are not aware of. Could you provide more details about the VM system you are using?

Btw, here is a workaroudn that you can keep using the 16 core VM, but limit the number of lambda workers.

<pre><code>

change worker limit to 8

graphlab.setruntimeconfig('GRAPHLABDEFAULTNUMPYLAMBDAWORKERS', 8) graphlab.setruntimeconfig('GRAPHLABDEFAULTNUMGRAPHLAMBDA_WORKERS', 8)

verify

graphlab.getruntimeconfig() </code></pre>

-jay


User 1191 | 2/4/2015, 8:50:28 PM

@JayGu I will try this out and get back to you.

Also, what kind of information do you want about the VM ?

Thanks


User 1190 | 2/4/2015, 8:58:40 PM

For example, are you using virtualbox or vmware? What configuration and what OS. As detail as possible so we can get close to reproduce your problem.


User 1191 | 2/4/2015, 11:57:01 PM

Hi @"Jay Gu"

The vm is from a cloud system deployed in my company based on VMware but configured for our internal cluster.

<blockquote class="Quote">[firdaus@fj: 23:37:56] {~}[1]>uname -a Linux fj.firdaus.cnje2 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 GNU/Linux </blockquote>

It has 16 cores, each with this spec: <blockquote class="Quote">processor : 15 vendorid : GenuineIntel cpu family : 6 model : 42 model name : Intel Xeon E312xx (Sandy Bridge) stepping : 1 microcode : 0x1 cpu MHz : 2499.998 cache size : 4096 KB physical id : 15 siblings : 1 core id : 0 cpu cores : 1 apicid : 15 initial apicid : 15 fpu : yes fpuexception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constanttsc repgood nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse41 sse42 x2apic popcnt tscdeadlinetimer aes xsave avx f16c rdrand hypervisor lahflm xsaveopt fsgsbase smep erms bogomips : 4999.99 clflush size : 64 cachealignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:</blockquote>

Also, changing the lambda-worker limit to 8 seems to have worked. Without this setting, the execution of SFrame.apply is very fragile. Again, it works sometimes but with the message that it cannot start / connect to all workers. Let me know if I can help you figure out what's going on.

Thanks.


User 1190 | 2/5/2015, 12:26:54 AM

Looks like there are some resource limit in the VM that prevent from using all 16 workers. We will look into it. Thanks for the detailed information and let me know anytime if you need further help.


User 1191 | 12/17/2015, 4:32:49 AM

hi Ping,

I added the following two env variables: <blockquote class="Quote">export LDLIBRARYPATH="/usr/lib/libpython2.7.so" export PYTHONPATH=$PYTHONPATH:'/home/firdaus/.virtualenv'</blockquote> and restarted ipython notebook.

However, I am still getting the same error.

Also, when I do ps, I see the following processes as started ...

<blockquote class="Quote">26778 pts/0 00:00:02 unityserver 26813 pts/0 00:00:00 pylambdaworker 26814 pts/0 00:00:00 pylambdaworker 26815 pts/0 00:00:00 pylambdaworker 26816 pts/0 00:00:00 pylambdaworker 26817 pts/0 00:00:00 pylambdaworker 26818 pts/0 00:00:00 pylambdaworker 26819 pts/0 00:00:00 pylambdaworker 26820 pts/0 00:00:00 pylambdaworker 26821 pts/0 00:00:00 pylambdaworker 26822 pts/0 00:00:00 pylambdaworker 26823 pts/0 00:00:00 pylambdaworker 26824 pts/0 00:00:00 pylambdaworker 26825 pts/0 00:00:00 pylambdaworker 26826 pts/0 00:00:00 pylambdaworker 26829 pts/0 00:00:00 pylambdaworker 26832 pts/0 00:00:00 pylambda_worker</blockquote>

The tail of the log file is : <blockquote class="Quote">1422978527 : INFO: (spawnworker:51): Worker pid = 26817 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000002 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000000 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000010 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000009 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000008 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000012 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000013 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:27): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000015 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:51): Worker pid = 26818 1422978527 : INFO: (spawnworker:51): Worker pid = 26825 1422978527 : INFO: (spawnworker:51): Worker pid = 26821 1422978527 : INFO: (spawnworker:51): Worker pid = 26822 1422978527 : INFO: (spawnworker:51): Worker pid = 26820 1422978527 : INFO: (spawnworker:51): Worker pid = 26823 1422978527 : INFO: (spawnworker:51): Worker pid = 26824 1422978527 : INFO: (spawnworker:51): Worker pid = 26829 1422978527 : INFO: (spawnworker:51): Worker pid = 26826 1422978527 : INFO: (spawnworker:51): Worker pid = 26816 1422978527 : INFO: (spawnworker:35): Start lambda worker at ipc:///var/tmp/graphlab-firdaus/26778/000015 using binary: /home/firdaus/.virtualenv/lib/python2.7/site-packages/graphlab/pylambdaworker 1422978527 : INFO: (spawnworker:51): Worker pid = 26819 1422978527 : INFO: (spawnworker:51): Worker pid = 26832 1422978539 : INFO: (spawnworker:62): Fail connecting to worker at ipc:///var/tmp/graphlab-firdaus/26778/000006. Status: Communication Failure 1422978539 : INFO: (spawn_worker:62): Fail connecting to worker at ipc:///var/tmp/graphlab-firdaus/26778/000009. Status: Communication Failure 1422978539 HTTP/1.1 200 OK Transfer-Encoding: chunked Date: Thu, 21 Jul 2016 23:13:36 GMT Server: Warp/3.2.6 Content-Type: application/json

016A ["37zyefqi2sweveyp","42fn7zeo6v5ui427","66pt5sk2wz2jrbzu","awoljknjigytdyls","cj2lanoogknwopto","cnm3adnh35xmsx3f","ebxs4t2y6xr5izzy","eg5zus2pz72mr7xb","exshwew2w2jv3n7r","hxrxgzvgms3incmf","hymu5oh2f5ctk5jr","jkisbjnul226jria","lag7djeljbjng6bu","o3l65o4qzcxs327j","qsk2jzo2zh523r24","t7k6g7fkndoggutd","xfllvjyax4inadxh","ygtjzi2wkfonj3z7","yycjajwpguyno4je"] 0