Powergraph on multicore machines

User 350 | 2/13/2015, 7:05:00 PM


We have 2 machines, 36 cores each. When we run experiments with "mpiexec -n 2 -map-by node -hostfile ~/machines", powergraph uses only a single core on each machine. What should we do to exploit all the available (72) cores?

Maybe: "mpiexec -n 72 -map-by core -hostfile ~/machines"?

But I think that powergraph gives warning that "it is recommended to use only one process per machine".

So how can we take advantage of the multiple cores machines?

Thanks a lot, Michael.


User 6 | 2/13/2015, 7:09:45 PM

--ncpus should set the number of cpus -n should set the number of multicore machines

Let us know if this works for you

User 350 | 2/13/2015, 7:11:09 PM

Oh, I will try this option and let you know. Thanks a lot!

User 350 | 2/13/2015, 11:06:51 PM

I tried --ncpus but it didn't work. When I used --ncpus 34, it showed error "...[32<32]...". When I used "--ncpus 30", it run but utilized only one core...

User 6 | 2/13/2015, 11:41:23 PM

Hi Michael, I have no clue which application are you trying to run as you sent a partial command line. Also there error message you gave me just 10 characters out of it!! Please send the full application and full error message. Posting words on this forum does not cost money!!

User 350 | 2/13/2015, 11:47:04 PM

Hi Danny,

I'm trying to run my own application which is based on the trianglescounting. The error message I got was: ERROR: fibercontrol.cpp(launch:270): Check failed: b<nworkers [32 < 32]

The command I use is: /usr/local/bin/mpiexec -n 2 -map-by node -hostfile ~/machines ./counting --graph live_journal.txt --format tsv --ncpus 34

Thanks, Michael.

User 6 | 2/13/2015, 11:47:12 PM

Also, when you run ./mpapp < command line aguments> --ncpus=32

does it work?

User 6 | 2/13/2015, 11:51:37 PM

1) It seems your machine has only 32 cores and not as you think. 2) Your application may have a bug? Try to fun one of our examples to see if it works with all the 32 cores.

User 350 | 2/14/2015, 12:05:59 AM

Thanks Danny, I will try with some built-in applications. "undirectedtrianglecount" should work? Regarding the cores, the spec says: Two Intel Xeon E5-2699 v3 2.3GHz,45M Cache,9.60GT/s QPI,Turbo,HT,18C/36T

User 350 | 2/14/2015, 7:02:37 PM

Hi Danny,

When I run the program (built-in undirectedtrianglecount) without mpiexec, it works on many cores: ./undirectedtrianglecount --graph live_journal.txt --format tsv --ncpus 30

and without --ncpus it also defaults to 30 cores.

BUT, when I'm running it with mpiexec, it always uses only one core, doesn't matter what I put in --ncpus.

/usr/local/bin/mpiexec -n 1 --map-by node -hostfile ~/machines undirectedtrianglecount --graph live_journal.txt

I also tried without "--map-by node" option. I started to use this option since without it, mpiexec launches processes on a single machine and not on multiple.

Do you have an idea how to make many cores to work with mpiexec?

Thanks a lot, Michael.

User 6 | 2/14/2015, 8:49:36 PM

Hi Again, It sounds like your MPI setup is wrong/ not working. The cpus should work also with mpi. Try first to execute non powergraph command for example mpiexec -n 2 ls

and try to fix it so it runs on two machines. the machine should be able to ssh from one to the other (and vice versa) without passwords.

User 350 | 2/14/2015, 9:40:25 PM

I'm able to tun on two machines but only when I add "--map-by node" option to the mpi.

Now I also figured how to run on many cores - by adding "-cpus-per-proc 30" option to MPI.

Is this a good workaround? Or it should not work like this?

BTW, when I try:

mpiexec -n 2 --hostfile ~/machines ls

it prints twice the same machine.

But: mpiexec -n 2 --map-by node --hostfile ~/machines ls

works correct and prints files from both machines.

Maybe these are the features of the new MPI? I'm using: Open MPI: 1.7.5


User 6 | 2/15/2015, 2:15:01 AM

I must be getting old.. never saw those mpi flags before. must be a new version...

User 350 | 2/15/2015, 5:33:07 AM

Yeah, the previous version worked fine without those flags. So it works now, but I have another related question which I'll ask in a separate thread. Thanks Danny!