User 542 | 7/29/2014, 7:47:21 PM
Using Graphlab 0.9 in Ubuntu 14.04:
When running a triple_apply function on a large graph (25 million vertices, 450 million edges) on a reasonable machine (12 cores, 32 Gb RAM). I end up seeing this in the graphlab logs:
"Unable to reach server for 25 consecutive pings."
over and over where 25 increases to about 200 or so and then the function dies in the python interpreter with Communication Failure 113.
File "/usr/local/lib/python2.7/dist-packages/graphlab/datastructures/sgraph.py", line 839, in tripleapply return SGraph(proxy=g.proxy.lambdatripleapply(tripleapplyfn, mutatedfields)) File "cygraph.pyx", line 181, in graphlab.cython.cygraph.UnityGraphProxy.lambdatripleapply File "cygraph.pyx", line 185, in graphlab.cython.cygraph.UnityGraphProxy.lambdatripleapply RuntimeError: Runtime Exception: 0. Communication Failure: 113.
Watching top: While it runs it seems that I the pylambda runners fill up my RAM and take about 50% of all of the cores. Then it seems to flush (I presume to reload from disk) and then pylambda workers fill back up and a few moments later I start seeing this messages.
Any ideas? Timeout error? Anything I can do to fix?
EDIT - In addition. Killing the python interpreter and the workers requires kill -9 signals.
EDIT EDIT - Nevermind the previous EDIT, In a second attempt I was able to Ctrl-D and wait a few minutes and python/graphlab terminated normally.