Distributed Computations

User 761 | 12/8/2014, 11:11:26 AM

Hi! I had some basic doubts about how Graphlab uses multiple cores or multiple nodes (in a Hadoop cluster) to do computation.

Case A : Single machine, multiple cores Case B : Multiple nodes in a cluster

2 questions

1.Does Graphlab parallelise (use all available cores/nodes) all computations that can be done in parallel?

2.Also, say I have a classification task. I run experiments and make a model which I save. Now, when I load a saved model and run it against new data to generate labels using classifier.evaluate is that computation done in parallel? (assuming multiple cores/nodes are available) Suppose I'm running 100 new samples through the saved classifier. All the 100 computations are essentially independent. I ask because we will be potentially running 10s of millions of samples through the saved classifier to generate labels, in which case a non-parallel implementation will be very expensive.

Thanks!

Comments

User 6 | 12/8/2014, 2:06:56 PM

Hi, So far, GraphLab Create utilizes all available cores on a multicore machine (even on a Hadoop cluster we utilize one machine). We are working on the next version that will support distributed computation as well.

In terms of model serving, it is possible and supported, to run a few servers that will compute the classification in parallel. So far we are using Amazon ec2 as a platform to run those servers. We have both load balancing, caching, failover and hot swapping of models supported. We will be happy to discuss to hear your use case in more detail and give advice. Feel free to email me in person to setup a meeting.


User 745 | 1/22/2015, 10:55:26 PM

Hi, will be possible running kmeans distributed as well ?


User 18 | 1/28/2015, 12:53:10 AM

@Takabayashi‌, not right now but it's coming.


User 1375 | 3/5/2015, 7:52:30 PM

Until the distributed version of Graphlab Create comes out, I assume that when one obtains an SFrame from a Spark RDD, all the distributed data from the RDD is copied/transferred to the local, non-distributed data structure that is the SFrame?