Maximum number of threads on a single node?

User 938 | 11/12/2014, 12:19:37 AM

I am trying to run the triangle counting application in GraphLab on a 40-core machine with two-way hyper-threading (for a total of 80 hyper-threads). I have downloaded and installed the latest version of GraphLab. However, I get the following error when I am trying to use all 80 hyper-threads (by setting the GRAPHLABTHREADSPER_WORKER variable to 80):

This program counts the exact number of triangles in the provided graph.

GRAPHLABSUBNETID/GRAPHLABSUBNETMASK environment variables not defined. Using default values Subnet ID: 0.0.0.0 Subnet Mask: 0.0.0.0 Will find first IPv4 non-loopback address matching the subnet ERROR: fibercontrol.cpp(launch:266): Check failed: affinity.popcount()>0 [0 > 0] [mirasol:50398] * Process received signal * [mirasol:50398] Signal: Aborted (6) [mirasol:50398] Signal code: (-6) [mirasol:50398] [ 0] /lib64/libpthread.so.0[0x364ea0f710] [mirasol:50398] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x364e232635] [mirasol:50398] [ 2] /lib64/libc.so.6(abort+0x175)[0x364e233e15] [mirasol:50398] [ 3] ./undirectedtrianglecount(ZN8graphlab13fibercontrol6launchEN5boost8functionIFvvEEEmNS18fixeddensebitsetILi64EEE+0x49)[0x591759] [mirasol:50398] [ 4] ./undirectedtrianglecount(ZN8graphlab11fibergroup6launchERKN5boost8functionIFvvEEENS18fixeddensebitsetILi64EEE+0xa1)[0x5958a1] [mirasol:50398] [ 5] ./undirectedtrianglecount(ZN8graphlab19distributedcontrol4initERKSt6vectorISsSaISsEERKSstmNS12dccommtypeE+0xa21)[0x5a14b1] [mirasol:50398] [ 6] ./undirectedtrianglecount(ZN8graphlab19distributedcontrolC1Ev+0x446)[0x5a3156] [mirasol:50398] [ 7] ./undirectedtrianglecount(main+0x5e8)[0x4b7ad8] [mirasol:50398] [ 8] /lib64/libc.so.6(__libcstartmain+0xfd)[0x364e21ed5d] [mirasol:50398] [ 9] ./undirectedtrianglecount[0x4b3249] [mirasol:50398] * End of error message * Aborted (core dumped)

When I set GRAPHLABTHREADSPERWORKER to 66 or less, then it works fine, and crashes with 67 or more. Does anyone know why this is the case? Also, when GRAPHLABTHREADSPERWORKER is not defined, I get the same error. I had figured to define the variable by looking at http://forum.graphlab.com/discussion/81/fiber-control-affinity-check-is-failing.

Comments

User 6 | 11/12/2014, 5:56:34 AM

Hi, The reason is here: https://github.com/graphlab-code/graphlab/blob/2e2a0d5ce1ffae81b3007ed018425d8e4115f328/src/graphlab/parallel/fiber_control.hpp#L44 we assume there are at most 64 cores. I would try to increase this number, recompile and let us know if this works for you.


User 938 | 11/15/2014, 1:28:51 AM

Thanks, Danny. I have gotten it to work by changing the number and recompiling.