[PowerGraph] SSSP utilizes 1 machine only in the cluster

User 11 | 12/4/2015, 3:36:22 AM

Hi,

I run an SSSP algorithm on 128 machines using the below command. The graph is a large road network graph, not power law. Since the diameter of the graph is large, it takes many iterations to finish. However, there is only one machine with 10% cpu utilization and the others have close to 1 or 2%.

mpiexec -f ~/slaves -n 128 \ "$GRAPHLABDIR"/release/toolkits/graphanalytics/sssp \ --source ${src} \ --directed 1 \ --engine sync \ --graph_opts ingress=auto \ --graph inputgraph \ --saveprefix outputdir

I tried to use auto and random partitioning, and the very same machine always have 10% utilization, while all others are almost not working. This machine is the first one in the ~/slaves file. I tried to find any where in the source code, such that more work is assigned for the MPI_PROCESS with rank=1, first in the ~/slaves file, with no luck.

Finally, kindly note this machine also send-out large data throughout the network to all other machines.

I appreciate if you have any explanation for this behavior.

Thanks, -K

Comments

User 1592 | 12/4/2015, 5:42:49 AM

Hi K You are using a deprecated version of our code. We recommend switching to GraphLab Create: https://dato.com/products/create/docs/generated/graphlab.shortestpath.create.html#graphlab.shortestpath.create where we have an SSSP implementation that case scale to a graph of 100,000,000,000 edges on a single machine. We also have a distributed version: https://dato.com/products/distributed/features.html

Regarding your question you may be seeing the graph finalization stage were one machine coordinates the graph structure among all machines, before the algorithm starts to run.


User 11 | 12/4/2015, 6:26:06 AM

Thank you Danny for your quick and useful answer. I did not know graphlab is now deprecated. I have couple of follow up comments, though:

1- In my code, this behavior lasts for couple of hours until the execution is done. During loading/finalize, the CPU utilization increases by about 8-10% in all workers including the first one in the ~/slave file. The difference of almost 10% always exist.

2- Is their a manual or documentation for upgrading my scripts from GraphLab to GraphLab create?

3- Finally, I understand that GraphLab create is out-of-core, Is it possible to keep data in memory, always.!

Thanks, -K