GraphLab as a Memory Cloud

User 949 | 11/23/2014, 10:59:55 AM

Is there a way to deploy GraphLab Create over a cluster (preferably running Hadoop/YARN) so that it can use the RAM as a memory cloud? Thus allowing me to store huge graphs in-memory?

I am thinking about something like the Trinity cloud (see a research paper using it: http://vldb.org/pvldb/vol5/p788zhaosunvldb2012.pdf , and the actual Microsoft Research paper about it: http://research.microsoft.com/pubs/183710/Trinity.pdf ), but naturally prefer GraphLab over any Microsoft product :) .

Comments

User 14 | 11/23/2014, 9:23:19 PM

Hi,

Again, very interesting idea. But it is not clear to me how does deploying over Hadoop/Yarn provides you with distributed memory. You would need some sort of communication layer to provide virtual memory addressing across all nodes. Then GraphLab Create need to be configured to use it.

On a single machine with sufficient disk space, SGraph can handle >2B edges and hundreds of millions of vertices. Distributed SFrame and SGraph are close next steps on our roadmap.

Thanks, Jay


User 949 | 11/24/2014, 6:01:39 AM

Very well. Whether such a communication layer is a part of GraphLab is exactly what I wanted to know. Thanks for your quick reply.