where to run partitioning algorithms

User 944 | 11/20/2014, 3:58:45 AM

Dears: There are 3 partitioning algorithms as random, coordiated, oblivious when loading and placing graph data, my question is which step to do this job and I guess it is graph.finalize(), is it correct ?


User 6 | 11/20/2014, 1:17:49 PM

Yes, This is documented <a href="https://github.com/graphlab-code/graphlab/blob/master/src/graphlab/graph/distributed_graph.hpp">here</a>:

  • ### Finalization
  • After all vertices and edges are inserted into the graph
  • via either load from file functions or direct calls to add_vertex() and
  • add_edge(), for the graph to the useable, it must be finalized. *
  • This is performed by calling \code graph.finalize(); \endcode on all
  • machines simultaneously. None of the load* functions perform finalization
  • so multiple load operations could be performed (reading from different
  • file groups) before finalization. *
  • The finalize() operation partitions the graph and synchronizes all
  • internal graph datastructures. After this point, all graph computation
  • operations such as engine, map_reduce and transform operations will
  • function. *
  • ### Partitioning Strategies *
  • The graph is partitioned across the machines using a "vertex separator"
  • strategy where edges are assigned to machines, while vertices may span
  • multiple machines. There are three partitioning strategies implemented.
  • These can be selected by setting --graph_opts="ingress=[partition_method]"
  • on the command line.
  • \li \c "random" The most naive and the fastest partitioner. Random places
  • edges on machines.
  • \li \c "oblivious" Runs at roughly half the speed of random. Machines
  • indepedently partitions the segment of the graph it
  • read. Improves partitioning quality and will reduce
  • runtime memory consumption. *
  • \li \c "grid" Runs at rouphly the same speed of random. Randomly places
  • edges on machines with a grid constraint.
  • This obtains quality partition, close to oblivious,
  • but currently only works with perfect square number of machines. *
  • \li \c "pds" Runs at roughly the speed of random. Randomly places
  • edges on machines with a sparser constraint generated by
  • perfect difference set. This obtains the highest quality partition,
  • reducing runtime memory consumption significantly, without load-time penalty.
  • Currently only works with p^2+p+1 number of machines (p prime).

User 944 | 11/24/2014, 12:30:00 PM

Copy it. Thanks a lot, Danny