Optimizations based on graph characteristics

User 90 | 3/24/2014, 7:47:12 AM

How does graph diameter and degree-skewness affect the performance of applications on GraphLab? Are there any specific optimizations that are used for high/low diameter graphs and maybe power-law graphs?

I am looking at following data for 2 programs in graph-analytics toolkit. LiveJournal is 4.8M vertices & 68.9M edges whereas RoadNet-CA is 1.9M vertices & 5.5M edges.

PageRank: LiveJournal: 295 sec (sync) ;; 716 sec (async) RoadNet-CA: 129 sec (sync) ;; 604sec (async)

SSSP: LiveJournal: 15 sec (sync) ;; 730 sec (async) RoadNet-CA: 82 sec (sync) ;; 60 sec (async)

Why does RoadNet-CA perform reverse in SSSP (async faster than sync) compared to PageRank (sync way faster than async)?

Comments

User 33 | 3/24/2014, 3:52:39 PM

Hi, Keval

The current version (2.2) of GraphLab uses PowerGraph engine, which is optimized for power-law graph by evenly partitioning workload of vertices to all machines of cluster. http://graphlab.org/resources/publications.html

PowerLyra is another work aiming at power-law graph based on GraphLab by differentiated partitioning and processing of high-degree and low-degree vertices. http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

The performance of sync and async mode depends on not only input graph but also algorithms. For SSSP with high diameter graphs, graphlab requires a large number of iterations (850+) for convergence in sync mode. It means very high performance cost from global barrier in each iteration and lower parallelism due to few active vertices. So async mode is relative good.

PowerSwith provides more detail comparison between sync and async modes of graphlab. http://ipads.se.sjtu.edu.cn/projects/powerswitch.html

The following paper analyses the issues on high diameter graph, but the solution, priority schedulers, may not be worked on distributed environment. "A Lightweight Infrastructure for Graph Analytics" (SOSP'13) http://sigops.org/sosp/sosp13/papers/p456-nguyen.pdf