Use GraphLab as a graph partitioning tool?

User 37 | 2/20/2014, 12:41:11 AM

Hi, I'm wondering is that possible to use GraphLab as a graph partitioning tool. More specificly, I want to have it output which worker gets which portion of the graph.

Thanks, Cui

Comments

User 33 | 2/20/2014, 3:55:14 AM

If you want to use a built-in online vertex-cut of GraphLab and save the partition results, you can try to write a new application or tailor an existing application (e.g., pagerank)

graph(...) // init (set partition type) graph.loadformat(...) // load graph from disk graph.finalize(...) // partition the graph graph.save(...) // store partitions to disk (http://docs.graphlab.org/usingsaving_answers.html)

If you want to implement an offline partitioning, there is a partitioning application in GraphLab, which uses normalized cut (svd and kmenas) doc: http://docs.graphlab.org/graphanalytics.html#graphanalytics_partitioning

Rong Institute of Parallel and Distributed Systems (IPADS) http://ipads.se.sjtu.edu.cn/projects/powerlyra.html


User 9 | 2/20/2014, 5:23:04 PM

rongchen, thanks for your response!


User 37 | 2/20/2014, 10:17:52 PM

Hi, I have some following up questions.

I'm going to follow the first approach (tailor the PageRank application and have it write the graph partitions). But how do I control the number of partitions.

I've just tried running it with 4 machine, each with 64 cores, and I expect it to output 4*64=256 partitions. But instead, it only outputs 16 partitions. Why it's 16, where does that number come from?

Thanks, Cui


User 33 | 2/21/2014, 1:53:18 AM

So far as I know.

partitions is equal to #MPI-instances, not #machines or #cores.

In general, #MPI-instances is also equal to #machines. Hence, I guess you just partition the graph into 4 pieces.

Graphlab writes 1 partition to multiple files in parallel to improve performance. Hence, #output-files is equal to #partitions * filepermachine (default is 4). So you got 16 files, not 16 partitions. The filepermachine is the last parameter of distributed_graph.save() function.

If you want try different number of partitions, I think you should change #MPI-instances. Using the command like: "mpiexec -f ~/mpd.hosts -n 256 ..." BTW, The results of partitioning are right, but don't believe the performance (maybe slow).

Rong Institute of Parallel and Distributed Systems (IPADS) http://ipads.se.sjtu.edu.cn/projects/powerlyra.html


User 37 | 2/24/2014, 3:07:30 PM

Hi, thanks for the help of Rong. I can now get the desired number of partitions.

And there's one thing I hope to make sure. I find there're 3 partitioning algorithms in the PowerGraph paper: Random, Oblivious, and Coordinated. Which of the 3 is the default one to be used in the GraphLab code distribution?

Thanks, Cui


User 33 | 2/25/2014, 2:25:25 AM

Hi, Cui

In graphlab code distirbution from PowerGraph paper random: ingress=random oblivious: ingress=oblivious coordinated: ingress=batch //has been deprecated, you need resume it in source code)

others grid: ingress=grid //from graphbuilder paper, only for #machine=XY, |X-Y|<=2 pds: ingress=pds //only for #machine=P^2+P+1

BTW: The order to select the default ingress : pds, grid, ovlivious

-- My project also provides some ingress methods for power-law graph and bipartite graph http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

hybrid: ingress=hybrid ginger: ingress=hybrid_ginger bipartite: ingress=bipartite

Rong Institute of Parallel and Distributed Systems (IPADS) http://ipads.se.sjtu.edu.cn/projects/powerlyra.html


User 4272 | 5/12/2016, 1:54:33 PM

Hi,

I have a question stream graph partitioning in graph lab.

If I run a partitioning algorithm (for example oblivious) on a cluster with four instances, does power graph instantiates four identical independent partitioners? If yes, is there any possibility to share some information between these partitioners to get a better partitioning result?

Regards, Reza


User 33 | 5/13/2016, 2:19:50 AM

Hi

(Original) PowerGraph instantiates partitioner on each machine, and loads different parts of input graph in parallel. Therefore, when do partitioning, the partitioners will communicate with each other to exchange data (e.g., vertices, edges or anything you want).

-Rong


User 4272 | 5/13/2016, 6:24:26 AM

Hi,

Thanks for the reply Rong! Is there any example where partitioners communicate? In built-in partitioners (Random, ...), I can't find any communication.

/Reza


User 33 | 5/13/2016, 9:24:42 AM

The detail implementation of partitioner is at https://github.com/dato-code/PowerGraph/blob/master/src/graphlab/graph/ingress/distributedingressbase.hpp

The main work of partitioning in PowerGraph is implemented at finalize(), and the communication among partitioner is based on XXX_exchange.

If you want to add your own logic, maybe you can refer to the implementation of partitioning in PowerLyra (homepage: http://ipads.se.sjtu.edu.cn/projects/powerlyra.html, github: https://github.com/realstolz/powerlyra)

-Rong