User 944 | 11/20/2014, 3:58:45 AM

Dears: There are 3 partitioning algorithms as random, coordiated, oblivious when loading and placing graph data, my question is which step to do this job and I guess it is graph.finalize(), is it correct ?

User 6 | 11/20/2014, 1:17:49 PM

Yes, This is documented <a href="https://github.com/graphlab-code/graphlab/blob/master/src/graphlab/graph/distributed_graph.hpp">here</a>:

- ### Finalization
- After all vertices and edges are inserted into the graph
- via either load from file functions or direct calls to add_vertex() and
- add_edge(), for the graph to the useable, it must be finalized. *
- This is performed by calling \code graph.finalize(); \endcode on all
- machines simultaneously. None of the load* functions perform finalization
- so multiple load operations could be performed (reading from different
- file groups) before finalization. *
- The finalize() operation partitions the graph and synchronizes all
- internal graph datastructures. After this point, all graph computation
- operations such as engine, map_reduce and transform operations will
- function. *
- ### Partitioning Strategies *
- The graph is partitioned across the machines using a "vertex separator"
- strategy where edges are assigned to machines, while vertices may span
- multiple machines. There are three partitioning strategies implemented.
- These can be selected by setting --graph_opts="ingress=[partition_method]"
- on the command line.
- \li \c "random" The most naive and the fastest partitioner. Random places
- edges on machines.
- \li \c "oblivious" Runs at roughly half the speed of random. Machines
- indepedently partitions the segment of the graph it
- read. Improves partitioning quality and will reduce
- runtime memory consumption. *
- \li \c "grid" Runs at rouphly the same speed of random. Randomly places
- edges on machines with a grid constraint.
- This obtains quality partition, close to oblivious,
- but currently only works with perfect square number of machines. *
- \li \c "pds" Runs at roughly the speed of random. Randomly places
- edges on machines with a sparser constraint generated by
- perfect difference set. This obtains the highest quality partition,
- reducing runtime memory consumption significantly, without load-time penalty.
- Currently only works with p^2+p+1 number of machines (p prime).

User 944 | 11/24/2014, 12:30:00 PM

Copy it. Thanks a lot, Danny