RuntimeError: Communication Failure: 113.

User 3069 | 1/27/2016, 5:07:12 PM

Tried 4 times, reproduced 100%

System: Disk size: ~infinite Memory: 8GB Cores: 16

Runtime configuration: gl.setruntimeconfig('GRAPHLABDEFAULTNUMPYLAMBDAWORKERS', 20) gl.setruntimeconfig('GRAPHLABDEFAULTNUMGRAPHLAMBDAWORKERS', 20) gl.setruntimeconfig('GRAPHLABCACHEFILELOCATIONS', largepartitiontmp_folder)

Minimum code: import graphlab configure() data = SFrame(source) # Loaded from disk, in native format result = data.groupby('key', aggregate.CONCAT('a'), aggregate.CONCAT('b'), aggregate.CONCAT('c'))

Sizes: 700 * 10^6 rows with 4 columns Grouping about 1000-10000 rows per key

Server log in attach.

Stack trace: /root/anaconda2/envs/dato-env2/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in runcellmagic(self, magicname, line, cell) 2291 magicargs = self.varexpand(line, stackdepth) 2292 with self.builtintrap: -> 2293 result = fn(magicargs, cell) 2294 return result 2295

<decorator-gen-60> in time(self, line, cell, local_ns)

/root/anaconda2/envs/dato-env2/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, a, **k) 191 # but it's overkill for just that one bit of state. 192 def magic_deco(arg): --> 193 call = lambda f, a, k: f(*a, k) 194 195 if callable(arg):

/root/anaconda2/envs/dato-env2/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in time(self, line, cell, localns) 1165 else: 1166 st = clock2() -> 1167 exec(code, glob, localns) 1168 end = clock2() 1169 out = None

<timed exec> in <module>()

/root/anaconda2/envs/dato-env2/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in groupby(self, keycolumns, operations, *args) 4200 with cythoncontext(): 4201 return SFrame(proxy=self.proxy.groupbyaggregate(keycolumnsarray, groupcolumns, -> 4202 groupoutputcolumns, group_ops)) 4203 4204 def join(self, right, on=None, how='inner'):

/root/anaconda2/envs/dato-env2/lib/python2.7/site-packages/graphlab/cython/context.pyc in exit(self, exctype, excvalue, traceback) 47 if not self.showcythontrace: 48 # To hide cython trace, we re-raise from here ---> 49 raise exctype(excvalue) 50 else: 51 # To show the full trace, we do nothing and let exception propagate

RuntimeError: Communication Failure: 113.

Comments

User 16 | 1/28/2016, 2:09:28 AM

Hi dremovd -

I'm sorry you're having this issue. Please let me know, what version of GraphLab are you using?

Thanks, Toby


User 3069 | 1/30/2016, 1:38:12 PM

Last one, i recently (about a week ago) did upgrade.


User 3069 | 1/30/2016, 7:11:53 PM

current version is v1.8, I will upgrade today


User 3069 | 1/30/2016, 7:36:41 PM

I also can see that allocated memory is going higher, so probably it is memory related issue. It takes all memory and starting to fill all swap too. Right before this error all physical memory and swap is filled, so it is more then 16G


User 3069 | 1/31/2016, 11:40:33 AM

I manage to process data by reducing very much grouping buffer: setruntimeconfig('GRAPHLABSFRAMEGROUPBYBUFFERNUM_ROWS', 1024)