GraphLab reduce memory usage overtime !

User 11 | 3/3/2014, 5:05:25 PM

Hi all,

I was running SSSP algorithm for multiple source vertices. I basically load the graph once and then iterate on all given source vertices. After running the engine for each source vertex I have a command to initialize all vertices distance by "graph.transformvertices(initvertex);".

I noticed that graphlab initially allocates 11.5 GB of memory (in the whole cluster) but eventually after processing several source vertices, the memory usage decreased until it reached about 6 GB after processing about 150 source vertices. This is an amazing performance. My guess is probably Graphlab learn what is the workload and probably re-distribute edges on the cluster nodes to reduce memory (replication). My question is, if this is the case how does GraphLab do this? Is there an internal engine that monitor workload?

If my guess is not accurate, do you have an explanation for this behavior?

Thanks, -Khaled

Comments

User 20 | 3/4/2014, 5:46:49 PM

Hi,

There is a simpler explanation for this behavior :-)

There is a little more memory utilization during initial graph loading due to a number of intermediate buffers which handle graph partitioning and distribution. After graph loading, these buffers are released and the malloc implementation (tcmalloc) is now free to release some of these memory back to the system, which it may do.

Yucheng


User 11 | 3/8/2014, 2:24:48 PM

Thank you Yucheng,

Thank you for pointing me out to the little extra memory utilization during initial loading and finalize state. My comment was in regards to after loading a graph and after executing an algorithm (SSSP in my case). I made some changes to SSSP so it processes multiple "source" vertices instead of one vertex only. After processing several vertices (executing SSSP several times) the memory is stable but after processing enough of them (number is not deterministic) the memory starts to decrease when monitored by OS however I noticed that Graphlab reports a fixed memory utilization across all SSSP runs.

I must note that the reason could be a bug in my code while initializing the vertex data (set distance to maximum). However, this problem is not an issue for me at the moment because I do not need to execute SSSP several times with one loaded graph. This been said, I'm willing to provide more details and trace the issue down if you are interested.

Thank you very much for answering our questions and building a positive community around GraphLab :)

-Khaled


User 20 | 3/17/2014, 11:46:56 PM

If GraphLab reports a constant memory utilization, than there is no issue: we basically just ask tcmalloc how much heap space it has allocated. However, tcmalloc does have some builtin heuristics for releasing unused heap memory back to the system, so you might be encountering that. In this case, it is certainly not an issue, and is definitely a good thing :-)

Thanks, Yucheng