Significant Slowdown with omni engine multiple nodes

User 965 | 11/21/2014, 7:04:31 AM

I am trying to write an implementation of Luby's algorithm. I broke the algorithm down into a series of map reduce stages, but I'm getting some odd timing behavior when I try to scale up to multiple nodes. When I run my implementation on a single node on a graph of a half million nodes, it executes in a few seconds, However, when I scale up to multiple nodes, completing the scan takes on the order of 1000 times longer. Adding more nodes just makes it worse.

I've tracked the problem down to running the omniengine calls, which go from taking tens of milliseconds per iteration on 1 node to 500 milliseconds per iteration on multiple nodes. This is the same whether I run the "async" or "sync" omniengine. The code for each part of the vertex_program consist of a single if statement and a return, so there's not much optimization that can be done there.

Is breaking the algorithm into a set of vertex programs for omni_engines the right approach here? What could explain the huge slow down?

The call I'm using to run the vertex programs is: graphlab::omniengine<removeneighbors> removeengine(dc, graph, "async"); removeengine.signalall(); removeengine.start();

And here's an example of one of the vertex programs (there are 4, all pretty similar): class removeneighbors: public graphlab::ivertexprogram<graphtype, int>, public graphlab::ISPODTYPE { public: edgedirtype gatheredges(icontexttype& context, const vertextype&vertex) const { return graphlab::IN_EDGES; }

  int gather(icontext_type& context, const vertex_type& vertex, edge_type edge) const {
     if (edge.source().data().state == accepted) {
        return -1;

  void apply(icontext_type& context, vertex_type& vertex, const gather_type& total) {
     if (total < 0) { == discarded; }

  edge_dir_type scatter_edges(icontext_type& context, const vertex_type& vertex) const {
     return graphlab::NO_EDGES;



User 6 | 11/21/2014, 8:18:47 AM

You have a take into account the difficulties of distributed execution (which is not related to GraphLab by the way). As long as you problem is small enough to be solved in a single computer in a few milliseconds, there is no reason to try and solve it using multiple computers, as the network and communication delay will slow you down significantly.

Additionally, the update you perform has zero computational cost. Algorithms which perform more computation and the cpu is the bottleneck will perform better when throwing in additional machines.

To summarize your problem is too small to effectively gain from distribution execution, and the update step is too light weight to allow for a good speedup.