Iterations in Graphlab ALS vs iterations in Spark ALS

User 1596 | 3/19/2015, 7:13:40 PM

I was trying to run the example ALS program present in the collaborativefiltering toolkit of Graphlab and compare it with MLLib ALS. In Graphlab ALS --maxiter specifies the maximum number of updates allowed for a vertex. How would this max_iter map to the iterations in MLLib ALS? Basically I am trying to understand how a fair comparison could be made between the 2 frameworks required in terms of the computation needed.

Comments

User 91 | 3/20/2015, 5:45:57 AM

If you are talking about Powergraph ALS, then I strongly recommend that you try out Graphlab-Create's recommender system. It has SGD, ALS, and several other algorithms. It also has other models like Factorization Machines, and Ranking Factorization which can use side features from users and items.

SGD for matrix factorization is known to be better than ALS for accuracy so a speed comparison isn't really fair because you will be getting a better model with SGD. I strongly suggest that you try out SGD in Graphlab-Create using the factorization recommender (https://dato.com/products/create/docs/generated/graphlab.recommender.factorizationrecommender.create.html#graphlab.recommender.factorizationrecommender.create)


User 6 | 3/20/2015, 1:14:07 PM

In each iteration of powergraph either the left side or the right side of the bipartite graph nodes. So one ALS iteration is two PowerGraph iteraions. However, maxiter is correct, so if you like to run 10 ALS iterations you set --maxiter=10 and PowerGraph will run for 20 internal iterations. I am no sure what happens in MLLib, you will need to investigate.