Does a very big node will reduce the performance of graphlab?

User 2616 | 11/16/2015, 8:48:56 AM

I've read the source code of graphlab recently, and i am confusing that when there is a vertex contains million or billion edges point to, it would not be split into other process, does this will be a bottleneck of performance? if not, how does it work? i don't know whether it is convenient to tell me the detail of this. thank you!


User 2603 | 11/16/2015, 12:49:46 PM

graphlab does not use the vertex-cut ?

User 2616 | 11/16/2015, 2:03:01 PM

@lyysdy as far as i read, i am afraid so~ the code shows that edges owned by the target vertex and the ghost vertex only occurs on the boundary of other process.

User 1592 | 11/17/2015, 1:51:10 PM

Hi, I would split the answer into two. Our latest GraphLab Create can handle up to hundreds of billions of edges on a single machine. We also have a distributed version that can scale to even larger graphs. See the following customer talk video:

I guess your question addresses our older code base PowerGraph. I suggest reading the following paper which explains our gather -apply-scatter model: In a nutshell, if a node has billion of edges those will be distributed to multiple machines and the node state will be replicated among those machines.

User 2616 | 11/17/2015, 2:21:49 PM

@Danny Bickson Thank you very much! The contents on the page is very useful and i am downloading the paper on the website to read later. Papers and the sourcecode i've read seems too old. And I have another question is that does the software have a real-time version so that we can construct the distribute graph while getting the inputs? Thanks for your excellent products~

User 1592 | 11/18/2015, 8:22:57 AM

User 2616 | 11/18/2015, 8:50:42 AM

@"Danny Bickson" aha,maybe your team will develop a new version so that it can be used in real-time work in the future~ :p

thanks again for your reply on such late time in your timezone,good night~