Synchronous engine halts for Live Journal dataset when running on cluster

User 121 | 7/28/2014, 3:28:26 PM

Hi all, I am running Clustering coefficient program (adapted from undirected triangle counting[1] program) on the cluster using 10 nodes. The tail of the console output is as follows: Allocated: 983.655 MB INFO: memoryinfo.cpp(logusage:90): Memory Info: After Engine Initialization Heap: 1715.76 MB Allocated: 983.37 MB INFO: memoryinfo.cpp(logusage:90): Memory Info: After Engine Initialization Heap: 1716.02 MB Allocated: 983.608 MB INFO: memoryinfo.cpp(logusage:90): Memory Info: After Engine Initialization Heap: 1716.65 MB Allocated: 984.091 MB INFO: synchronousengine.hpp(start:1299): Iteration counter will only output every 5 seconds. INFO: synchronousengine.hpp(start:1314): 0: Starting iteration: 0 INFO: synchronous_engine.hpp(start:1363): Active vertices: 4846609

After this, I get the following error message: mpirun noticed that process rank 9 with PID 8740 on node node040 exited on signal 9 (Killed).

The program runs fine on the smaller graphs, but on LJ dataset (4M vertices and 6 million edges), it halts. My vertexdata is quite bulky (vidvector[1] of IN edges, OUT edges, and neighbours) could that be the reason ? where should I look for more information ?

[1] https://github.com/graphlab-code/graphlab/blob/b04106c6234e3b513404af83a608cef866768643/toolkits/graphanalytics/undirectedtriangle_count.cpp

No Comments