Graphlab not working on nodes having different glibc version

User 79 | 3/2/2014, 6:22:50 AM

Hi ,

The glibc version is different on the machine I build the graphlab and the cluster nodes. How can I build the cluster using nodes having different OS(glibc version different). Error I my getting is:

mpiexec -n 1 --hostfile ~/machines /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank --graph=graphdata/ --format="tsv" --saveprefix=out --graphopts="ingress=random" /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank: /lib/x8664-linux-gnu/libc.so.6: version GLIBC_2.16' not found (required by /home/hadoopnew/graphlab/release/toolkits/graph_analytics/pagerank) /home/hadoopnew/graphlab/release/toolkits/graph_analytics/pagerank: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC2.15' not found (required by /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank) /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank: /lib/x8664-linux-gnu/libc.so.6: version `GLIBC2.14' not found (required by /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank)

Thanks Dinesh

Comments

User 6 | 3/2/2014, 6:34:45 AM

Hi Dinesh, You should compile GraphLab using the target architecture of your cluster nodes.

best,


User 79 | 3/2/2014, 6:38:39 AM

Hi,

Target architecture is different for different nodes in the cluster. If I will compile multiple time then the binary hash will be different and the GraphLab will not work. I am getting this error while compiling for multiple target.

mpiexec -n 2 --hostfile ~/machines /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank --graph=graphdata/ --format="tsv" --saveprefix=out --graphopts="ingress=random" GRAPHLABSUBNETID/GRAPHLABSUBNETMASK environment variables not defined. Using default values Subnet ID: 0.0.0.0 Subnet Mask: 0.0.0.0 Will find first IPv4 non-loopback address matching the subnet FATAL: dctcpcomm.cpp(accepthandler:532): MD5 mismatch. Process 1 has hash ada94590c4adb4324ab879d4a7719201 Process 0 has hash 3923bb7c8b8e3e0022885567da2cbe34 GraphLab requires all machines to run exactly the same binary. [hadoopnew-VirtualBox:10574] * Process received signal * [hadoopnew-VirtualBox:10574] Signal: Aborted (6) [hadoopnew-VirtualBox:10574] Signal code: (-6) [hadoopnew-VirtualBox:10574] [ 0] /lib/x8664-linux-gnu/libpthread.so.0(+0xfbd0) [0x7f7ad9134bd0] [hadoopnew-VirtualBox:10574] [ 1] /lib/x8664-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7ad6cd0037] [hadoopnew-VirtualBox:10574] [ 2] /lib/x8664-linux-gnu/libc.so.6(abort+0x148) [0x7f7ad6cd3698] [hadoopnew-VirtualBox:10574] [ 3] /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank(ZN8graphlab7dcimpl11dctcpcomm14accepthandlerEv+0xaa1) [0x60d671] [hadoopnew-VirtualBox:10574] [ 4] /home/hadoopnew/graphlab/release/toolkits/graphanalytics/pagerank(ZN8graphlab6thread6invokeEPv+0x32) [0x5b2e82] [hadoopnew-VirtualBox:10574] [ 5] /lib/x8664-linux-gnu/libpthread.so.0(+0x7f8e) [0x7f7ad912cf8e] [hadoopnew-VirtualBox:10574] [ 6] /lib/x8664-linux-gnu/libc.so.6(clone+0x6d) [0x7f7ad6d92e1d] [hadoopnew-VirtualBox:10574] * End of error message *


mpiexec noticed that process rank 1 with PID 10574 on node 10.6.30.73 exited on signal 6 (Aborted)

Thanks Dinesh


User 6 | 3/2/2014, 3:33:27 PM

You could try to comment out our compatible Linux version check, but this is not recommended since the run may fail with weird errors. I suggest switching to two machines with the same Linux version.


User 15 | 3/4/2014, 4:54:10 PM

Hi Dinesh,

I think the easiest solution to this problem is actually to compile GraphLab on the cluster nodes that have the oldest version of libc. You can check what version of libc is installed on your system by executing "ldd --version". The issue is that the GNU C Library uses versioned symbols which are backwards compatible but cannot be forward compatible. You can verify that this is happening by executing "objdump -t (thebinary)" and grepping the output for "GLIBC2.1", or whatever version of GLIBC. If the binary has a symbol with a newer version than the output of "ldd --version" on that machine, you will get an error like the one you showed.

If you're not allowed to compile on the cluster nodes that you are targeting, there are a few options: - Find an older Linux machine with an old enough libc - Compile in a VM with an OS with an old enough libc - Set up a chroot enviornment (...with an old enough libc) here are directions on how to do this in Ubuntu: https://help.ubuntu.com/community/BasicChroot - Set up a cross-compiler toolchain

...with of course, each final step being to move the binaries you created to your target machine.

If you are on a fancy enough cluster, the cross-compiler for each possible cluster node you could target may already be set up. On clusters I have used in the past, it was as simple as executing a "module add (arch_name)" and then compiling, but every cluster environment is different...you'd have to ask your administrator, if that is not you.

Just to be clear, this is not a problem specific to GraphLab...you would run into this when compiling anything from source on different OS versions with different libc versions. There's lots of posts on forums across the web complaining about this very same problem. I just faced it myself. :)

Evan