There are 3 partitioning algorithms as random, coordiated, oblivious when loading and placing graph data, my question is which step to do this job and I guess it is graph.finalize(), is it correct ?
User 6 | 11/20/2014, 1:17:49 PM
Yes, This is documented <a href="https://github.com/graphlab-code/graphlab/blob/master/src/graphlab/graph/distributed_graph.hpp">here</a>:
After all vertices and edges are inserted into the graph
via either load from file functions or direct calls to add_vertex() and
add_edge(), for the graph to the useable, it must be finalized.
This is performed by calling \code graph.finalize(); \endcode on all
machines simultaneously. None of the load* functions perform finalization
so multiple load operations could be performed (reading from different
file groups) before finalization.
The finalize() operation partitions the graph and synchronizes all
internal graph datastructures. After this point, all graph computation
operations such as engine, map_reduce and transform operations will
### Partitioning Strategies
The graph is partitioned across the machines using a "vertex separator"
strategy where edges are assigned to machines, while vertices may span
multiple machines. There are three partitioning strategies implemented.
These can be selected by setting --graph_opts="ingress=[partition_method]"
on the command line.
\li \c "random" The most naive and the fastest partitioner. Random places
edges on machines.
\li \c "oblivious" Runs at roughly half the speed of random. Machines
indepedently partitions the segment of the graph it
read. Improves partitioning quality and will reduce
runtime memory consumption.
\li \c "grid" Runs at rouphly the same speed of random. Randomly places
edges on machines with a grid constraint.
This obtains quality partition, close to oblivious,
but currently only works with perfect square number of machines.
\li \c "pds" Runs at roughly the speed of random. Randomly places
edges on machines with a sparser constraint generated by
perfect difference set. This obtains the highest quality partition,
reducing runtime memory consumption significantly, without load-time penalty.
Currently only works with p^2+p+1 number of machines (p prime).