User 1886 | 5/5/2015, 4:57:21 PM
First, I would like to congratulate the developers of graphlab/dato - it seems a really great tool to solve graph-based problems!
I am a beginner with graphlab and I would need your help to understand how to use the full potential of graphlab.
Here is my problem.
I have developed a new pagerank-based algorithm and I wanted to parallelize it to speed it up. My MacBook Pro has 8 cores and I have also access to a linux machine of 16 cores.
So I looked at this graphlab code for pagerank using the tripleapply function: https://github.com/dato-code/how-to/blob/master/tripleapplyweightedpagerank.py and I just modified it to fit my pagerank algorithm. I got the same output solution (in terms of values) using the graphlab code and the serial python code for my pagerank-based algorithm.
The issue is that the graphlab python code is 100x SLOWER than the serial python code.
I thought the issue came from my graphlab code so I implemented the standard pagerank algorithm in serial python (see below), and compare it with https://github.com/dato-code/how-to/blob/master/tripleapplyweighted_pagerank.py The graphlab speed is still 100x slower than serial.
So here are my questions: (1) Why the graphlab pagerank algorithm is slow? (2) How to make it fast?
Thanks a lot for your help Xavier
PS. My serial python code for pagerank: for iter in range(10): # Pagerank update F = (1 - reset_prob) W F + reset_prob