Parallel synchronous tasks

User 912 | 2/13/2015, 3:56:39 PM


When I am using the parallelforeach function, I thought that the job is going to run synchronously the tasks, since it's parallel.. However, I just noticed that it automatically runs in Asynchronous Local environment. However, If I create just a simple Local environment (synchronous), it looks like it has its maximum degree of parallelism set to 1 and there's no documentation on how I can change that..

So, my question is, is there a way to run parallel tasks synchronously with parallelforeach and if yes, can you please explain how?

Regards, Vyara


User 91 | 2/13/2015, 9:09:10 PM

Almost all GraphLab data structures, models, and utilities are implemented natively in C++ with efficient multi-core implementations. So, if your machine has 40 cores. then model training will use all 40 cores.

As a result of this, it may not be efficient to run multiple processes in parallel each of which is trying to use all your cores. Hence, the parallelforeach on your local machine only uses 1 process. You can perform distributed jobs on your cluster using the environment = EC2.

User 912 | 2/14/2015, 2:27:02 PM

So, the way I understand it is that the function parallelforeach can be "parallel" only if I run it on environment like Hadoop or EC2, is that correct? There's no way to achieve parallelism on local machine (run multiple tasks synchronously)?

User 91 | 2/14/2015, 6:47:55 PM

Yes that is correct.