Multiple network interfaces in each machine, how to explicitly choose?

User 37 | 5/12/2014, 7:13:27 PM

Hi,

I find the release note says: ''' Unable to launch distributed GraphLab when each machine has multiple network interfaces The communication initialization currently takes the first non-localhost IP address as the machine’s IP. A more reliable solution will be to use the hostname used by MPI. A solution is in the works. ''' My machines actually do have multiple network interfaces, each with different host names. Do you know how to explicitly choose the network interface they use. It seems that just providing the proper host names to MPI is not enough.

Thanks, Cui

Comments

User 6 | 5/13/2014, 3:56:13 AM

On default you should not worry about the subnet id and subnet mask. The original IP address detection routines will cause GraphLab to try to use the first IP address of the machine which is not localhost.

However, this is an issue for systems with multiple NICs, where for instance, one NIC is used for external networks, and one for internal networks. Selecting the wrong address will prevent GraphLab from operating correctly.

Starting from GraphLab 2.1, GraphLab has the capability to select the network to use.

The IP detection routines will read the environment variables GRAPHLABSUBNETID and GRAPHLABSUBNETMASK if available, and will use the first local IP address matching the subnet.

For instance,

mpiexec -n 2 env GRAPHLABSUBNETID=192.168.0.0 GRAPHLABSUBNETMASK=255.255.0.0 ./pagerank .... will have pagerank run over the 192.168.0.0/255.255.0.0 network.

If only GRAPHLABSUBNETID is provided, it will try to guess the subnet mask by left extending the subnet ID.

If both GRAPHLABSUBNETID and GRAPHLABSUBNETMASK are not provided, they default to 0.0.0.0 and 0.0.0.0 which corresponds to the original behavior.

You define the environment variable using either export (on bash) or setenv (in tcsh) command based on your linux version.


User 37 | 5/20/2014, 2:49:20 PM

It works, thanks!


User 350 | 4/28/2015, 10:29:42 PM

Hi, I'm experiencing a problem when my machines have multiple interfaces. I've set up the environment variables for the netmask but the communication is still not working. Here what I get after running on 2 machines:

GRAPHLABSUBNETID/GRAPHLABSUBNETMASK environment variables not defined. Using default values Subnet ID: 146.6.53.0 Subnet Mask: 255.255.255.0 Will find first IPv4 non-loopback address matching the subnet [asterix][[19371,1],0][btltcpendpoint.c:792:mcabtltcpendpointcomplete_connect] connect() to 192.168.127.1 failed: Connection timed out (110) [obelix][[19371,1],1][btltcpendpoint.c:792:mcabtltcpendpointcomplete_connect] connect() to 10.0.0.1 failed: Connection timed out (110)

Environment variables are set:

env | grep GRAPHLAB GRAPHLABSUBNETID=146.6.53.0 GRAPHLABSUBNETMASK=255.255.255.0

When I disable all (except one) interfaces on the slave machine, everything works.

Any help?

Thanks a lot, Michael.


User 1592 | 4/30/2015, 12:49:50 PM

Hi Michael If you are using MPI you may define the env variable on one machine but the other machines are not getting them. You should use the MPI environment variables which is either -x or env depends on your MPI version. For example mpiexec --hostfile machines -x LDLIBRARYPATH=/home/daroczyb/graphlab/deps/local/lib/ /mnt/info/home/daroczyb/als /mnt/info/home/daroczyb/smallnetflix_mm.train