Working with Amazon s3

User 350 | 7/31/2014, 5:45:54 PM

Hi,

I have the following question regarding PowerGraph:

1) Is it possible to load graph files from s3? I tried to put the s3 link in the load() function, but got an error: "No files found matching s3-us-west-2..." 2) Is it possible to save outputs directly to s3?

I'm asking since I want to work with very large graphs which will not fit on the maximum EC2 disk (which is 1T).

Thanks a lot, Michael.

Comments

User 14 | 7/31/2014, 6:16:55 PM

GraphLab Create does support load and save with S3. PowerGraph only has local disk and hdfs support.


User 350 | 7/31/2014, 6:30:27 PM

Thank you for the reply. Is it possible to write custom programs (gather,apply,scatter) using GraphLab Create?


User 14 | 7/31/2014, 6:59:16 PM

GraphLab Create support another graph computation abstraction called tripleapply, which is easier to use and equally expressive. Checkout the documentation here: http://graphlab.com/products/create/docs/generated/graphlab.SGraph.tripleapply.html. Right now, we only support user defined function written in python, but exposing the c++ interface is on the road map.


User 350 | 7/31/2014, 7:05:14 PM

Thanks, I will check this. The last related question - is it possible that each machine will store only part of the graph file to load? Or each machine has to have all the parts of the graph file?


User 14 | 7/31/2014, 7:21:11 PM

GraphLab Create is currently single machine only however it can efficiently handle graphs with around 2B edges.


User 350 | 8/1/2014, 12:26:47 AM

My question was about PowerGraph. Is it possible to input graph in the way that each physical machine stores only part of it?


User 14 | 8/1/2014, 1:33:06 AM

Yes, you need to modify this function https://github.com/graphlab-code/graphlab/blob/master/src/graphlab/graph/distributed_graph.hpp#L2161.


User 350 | 8/1/2014, 2:01:37 AM

Thanks a lot. I will try to figure this out. This would be very helpful, since I have a ~4T file while single disk on EC2 is limited to 1T.


User 10 | 8/1/2014, 5:43:38 PM

Have you looked at the <a href="http://aws.amazon.com/ec2/instance-types/#HS1">HS1</a> instance type? It has 46TB of storage on 24 drives.


User 350 | 8/2/2014, 3:59:46 AM

Thanks for the reference! Will check this.