User 937 | 11/11/2014, 9:44:39 PM

In the example apps, I can only see the pagerank on an directed graph. To compute it on a directed graph, it seems that pagerank should work well by just calling all edges during iterations. But the init stage makes me headache, where a edge cannot serve as read and write storage at the same time.

Any suggestion would be appreciated. Thanks!

User 6 | 11/12/2014, 9:01:43 AM

I guess you are asking about GraphChi? Your question is not clear, since the implementation can handle undirected graph. Simply list both edge directions in the input file. Basically we aggregate all input weights from the incoming edges, compute the weighted mean and send it to the outgoing edges. If the graph is undirected it is simply the same set of edges.

Please note that we are going to release a newer open source soon - Graphlab Create which has a python interface. See: http://graphlab.com/products/create/docs/generated/graphlab.pagerank.create.html#graphlab.pagerank.create

User 937 | 11/12/2014, 7:22:59 PM

Thanks for your reply. Sorry I forget to mention it's GraphChi.

What I mean is that since all vertices are computed in sequence, we can use them to serve as two usages at the same iteration.

For example, we have two vertices a and b, a < b. For the edge(a,b), when we update vertex a, we read data from it. After finish updating the vertex value, we can immediately update the edge(a, b) with some new weight. So if we want to update vertex b, we can directly use the new weight of edge(a, b). But this method has a problem that the init stage would be very hard to write.

So that means we have to double edges in the input file to do computations on a undirected graph, is that right?

User 6 | 11/13/2014, 6:06:32 AM

This issue is well known and is the difference between <a href="http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method">Jacobi and Gauss Seidel</a> iterations (namely synchronous vs. asynchronous execution).

Please note that for pagerank, it will be much more efficient to store the vertex values in memory in an array, that way you do not need to send messages at all and can use the latest values. GraphChi was written with the assumption you do not have enough memory to save the vertex values in memory and thus everything is written to disk.

Anyway you are welcome to change the source code as you like to try out the different methods.