How to ensure mpiexec run instances on different nodes

User 568 | 11/5/2014, 9:40:23 PM

Hi all,

I use the following command to run the program:

mpiexec -n 4 -hostfile ~/machines release/apps/experiment/mshortestpath --graph /grid/0/j/dataset/twitter_input.

There are 8 nodes in machines.

But the log has following msg: WARNING: dc.cpp(init:587): Duplicate IP address: 10.0.10.161 WARNING: dc.cpp(init:592): For maximum performance, GraphLab strongly prefers running just one process per machine.

Does this means that the program runs on a single node?

Comments

User 568 | 11/5/2014, 10:23:38 PM

I run

mpiexec -n 8 -hostfile ~/machines release/apps/experiment/mshortestpath --graph /grid/0/j/dataset/twitter_input.

I have a cluster of 8 nodes and every nodes has 90+ GB mem. The dataset is about 20G.

However, the program end with

terminate called after throwing an instance of 'std::badalloc' terminate called after throwing an instance of 'std::badalloc' terminate called after throwing an instance of 'std::badalloc' what(): std::badalloc what(): std::badalloc what(): std::badalloc


User 6 | 11/6/2014, 7:43:31 AM

There are two different issues here. 1) duplicate ip - it means that two MPI powergraph nodes where run on the same physical machine. Check your host file maybe there are not enough hosts. 2) bad alloc - the system got out of memory. Probably because two MPI nodes run on the same physical machine and thus reducing the available mem by half

We recommend trying out GraphLab Create as 20Gb input file should not be a problem to run on a single multicore machine, and shortest path computation is supported.

Best,


User 568 | 11/6/2014, 2:02:51 PM

The hostfile has 8 nodes and every nodes has 90GB memory.

Do you know why it is still duplicate ip?