GraphLab Create - Iteration order over edges

User 949 | 11/23/2014, 10:55:34 AM

Hello,

I am implementing the Label Propagation algorithm (http://arxiv.org/pdf/0709.2938v1). In this algorithm the order in which I iterate over all the edges of the graph must be random. In ordinary Python, I'd use something like random.shuffle; is there a way to guarantee randomness in GraphLab? Or can I at least assume that the order of execution is quite random as it is?

Comments

User 949 | 11/24/2014, 11:41:20 AM

Was I perhaps misunderstood? Please let me elaborate:

Let's say I was working with networkx, and that I were willing to go through the nodes or the edges in a random order.

I could do something like:

<code class="CodeInline">from random import shuffle # in-place random shuffle nodes = graph.nodes() shuffle(nodes) for n in nodes: pass

same for the edges:

edges = graph.edges() shuffle(edges) for e in edges: pass </code>

Even something as basic as just getting the order in which they are scheduled (to be executed in parallel by triple_apply) would be enough. So, does GraphLab support this feature?

Best,

g.


User 14 | 11/24/2014, 7:19:10 PM

Shuffling edges is going to be expensive for large graph. In triple_apply, edges are divided into blocks, and blocks are visited parallel in a particular order to minimize I/O cost.

If you can help us understand more about your use case, we will see how we can support fetching the order of edge visits.


User 949 | 11/25/2014, 12:53:55 PM

I am implementing the label propagation algorithm ( http://arxiv.org/abs/0709.2938 ). In this algorithm, every node copies the most common label amongst its neighbors. Nodes are updated asynchronously, and in a random order; i.e., in every round of the algorithm, the order in which a node gets to pick its labels is random, and may influence the pick of its neighbors that did not yet picked their label.

Anything that at least supplies some degree of randomness would suffice in my case.

=== In the bottom of the 4th page of the paper describing the algorithm, you can find this quote: <blockquote class="Quote"> "The problem however is that subgraphs in the network that are bi-partite or nearly bi-partite in structure lead to oscillations of labels (see figure 3). This is especially true in cases where communities take the form of a star graph. Hence we use asynchronous updating" </blockquote>


User 14 | 11/25/2014, 6:26:15 PM

One way to "simulate" some degree of randomness is to force some messages to be delayed or dropped. Suppose each edge holds a binary random variable x. The edge is skipped when x == 1. Next iteration, x is flipped to guarantee all edges have been visited. Essentially, each iteration is split into two executions, with one random subset of edges executed before the other random subset. This will not be efficient, however, you might want to use it as a proof of concept.


User 949 | 11/26/2014, 12:24:08 AM

Thanks.