PowerGraph "stop" when I run PageRank in two machines.

User 2745 | 12/11/2015, 6:09:12 AM

When I run "mpiexec -n 2 -hostfile machines graphlab/release/toolkits/graph_analytics/pagerank --powerlaw=100000" with two machines, it output the information below:

"GRAPHLABSUBNETID/GRAPHLABSUBNETMASK environment variables not defined. Using default values Subnet ID: 0.0.0.0 Subnet Mask: 0.0.0.0 Will find first IPv4 non-loopback address matching the subnet"

Then it "stop" to output any information. When I use "top" in two machines, the "PageRank" progam is 100%. There is another person has come across this problem for something wrong with virtbr0, but it is useless to my situation.

Someone suggests to replace openMPI with MPICH.

Do anybody kown how to do?

Thanks !!

Comments

User 1592 | 12/11/2015, 4:15:41 PM

Can you run "mpiexec -n 2 -hostfile machines ls" and let us know if this works for you.


User 2745 | 12/12/2015, 12:30:57 AM

Hi, Danny, as you say , the output is : "ExperimentData backtrace.0 docker graphlab machines ExperimentData backtrace.0 docker graphlab machines"

Thanks for you help!


User 1592 | 12/12/2015, 8:12:06 AM

Can you run with --powerlow=1000 and see if it works for you?


User 2745 | 12/12/2015, 12:13:36 PM

Hi, Danny, I have tried --powerlaw=1000 and --powerlaw=100, the results are same to --powerlaw=100000 before. It still stops. (It is all ok when I run the PageRank in single machine) Thanks for your answer!


User 1592 | 12/12/2015, 2:35:33 PM

Can you make sure you are running on 64 bit machines. Can you make sure you compiled GraphLab executable on each machine (and not copied the executable from one to the other). If the machines have different systems it may not work.


User 2745 | 12/12/2015, 2:53:17 PM

Hi, Danny, I am sure the two machines are ubuntu 14.04LTS, 64bit(X86_64). They are both compiled successfully independent. Each machine can output the right result when run "./pagerank --powerlaw=10000"


User 2745 | 12/16/2015, 2:23:02 PM

I see the source code of PowerGraph. Based on the output of terminal, I think the problem is in dc.cpp(graphlab->rpc->dc.cpp). Maybe the program "stop" in function init() and there are two loops before building cluster. But I can not understand the code well. Call for helps!!@Danny Bickson Thanks!!!!