Aggregation in GraphLab

User 1768 | 5/27/2015, 5:08:53 PM

I would like to know is there any possibility in Graphlab to do averaging in graph. My mean is each node (vertex) has a local double value and some direct friends and, then the vertices start to exchange their value and make an averaging for that value to converge a average value in the whole graph for all vertices. After convergence each node has a avg value instead of local value. This is similar to the work of mark Jelasity (aggregation based on peer sampling service) by Peersim. Thanks in advance to guide me

Comments

User 1178 | 5/28/2015, 4:18:52 AM

Hi naoomi,

You may use GraphLab Create's [tripleapply](https://dato.com/products/create/docs/generated/graphlab.SGraph.tripleapply.html?highlight=tripleapply#graphlab.SGraph.tripleapply) to achieve that. What triple_apply gives you is to be able to access vertex-edge-vertex triplets in parallel and do aggregation on top of that.

So the idea is the following:

For each iteration initialize each vertex with two fields: count = 1, sum = vertex['value'] triple_apply -- for each (src,edge,dst) triplet, accumulate count and sum for each src update each vertex to have new value (sum/count) check converge criteria, if not reach, continue next iteration

If you want more performant algorithm, you may use SDK to achieve that.

Ping


User 1592 | 5/28/2015, 8:00:54 AM

Hi The task you describe sounds like pagerank. I suggest looking into our pagerank implementation and also the weighted pagerank: https://github.com/dato-code/how-to/blob/master/tripleapplyweighted_pagerank.py


User 1768 | 5/28/2015, 11:14:42 AM

Thanks @Danny and @"Ping Wang" . I will try your suggestions.


User 1768 | 5/28/2015, 12:38:29 PM

Dear @"Ping Wang" hi again , I just have an unclear point, my mean is fully decentralized Gossip-based aggregation that every node just has one record and can not move to central location, and you know that the basic requirement of Gossip learning is to have an unbiased random sampling of nodes to avoid selecting high degree nodes several times. Because of the power-low of graphs such as social network. There are several techniques to do random sampling such as random walk, random traversal techniques (biased) and peer-sampling-service (unbiased and random) that that last one is the best one. In graph lab, how do you select this random nodes?


User 1178 | 5/29/2015, 6:19:49 PM

Hi naoomi,

GraphLab Create do not directly support the scenario you are describing, but you can alway build on top of what we provide. For example, using tripleapply, you can custom the code inside tripleapply to decide how you want to consolidate information from src-edge-dst triplets. You may put your own sampling technique and decide the consolidated result with each iteration.

Thanks!

Ping