Using ALS instead of SGD

User 888 | 10/28/2014, 9:35:34 PM

Hi --

I was wondering how can I change the default setting for the Non-negative Matrix Factorization in a way that it runs the alternating least square optimization rather than stochastic gradient descent.




User 18 | 10/28/2014, 10:24:34 PM


Currently GraphLab Create does not implement ALS for matrix factorization. Is there a problem with SGD? If so, please let us know.

Thanks! Alice

User 888 | 11/4/2014, 9:23:16 PM

Hi --

Thank you for the response.

As far as I know in my case, the problem with NMF using SGD is that it will end up in a local optima specially if the data is sparse. The local optima solution is good for distinguishing between the users but not so good at distinguishing between the users. When I plot the W (item similarity) matrix, most of the variance is represented among a few items and the rest of the items have small variance in the latent dimension projection. Purely based on experiment, ALS does a better job keeping the variance on the lower dimensional space.

Your itemsimilariyrecommender does a really good job regardless of the sparsity without any optimization problem (local optima). I was wondering if there is a way we can either feed the results from the itemsimilarityrecommender, into the factorization recommender, as initial values for the W vector or retrieve the item similarity matrix as an adjacency matrix from the itemsimilaritymodel so I can use it as side data for the factorization_recommender.



User 18 | 11/6/2014, 12:49:44 AM

Hi falakmasir,

That's interesting! We haven't observed that particular behavior with NMF+SGD. But we'll play with it. Until now we haven't seen a compelling reason to bring in ALS. But this could be that reason. Thanks for the tip.

Using the item similarity matrix as initialization for the factorization machine is an interesting idea. You can retrieve the ranked similarity table from itemsimilarityrecommender using <a href=">getsimilaritems</a>. (This will be expensive if there are a large number of items.) You can then feed it into the factorization recommender as side items. A note of caution: factorization machines can be finicky to tune because they have way too many degrees of freedom. It is also quirky in that it can have trouble absorbing numeric side features, so we recommend binning real-valued numeric features and converting the bin numbers to string type, which will then be treated as categorical variables. I would also start with just matrix factorization instead of full factorization machine. In GLC you can get MF by setting side_data_factorization=False in factorization_recommender.create().

The alternative you mentioned--using them as initial values for item factors--is less viable in the current version of GLC since we don't yet allow for hard-coded initial values for factorization machine. Aside from that. I'm also not sure how the proposal would work. Item similarity gives you the full similarity matrix (or the top k similar items for each item). This is not the same as the latent vector representation in factorization machines. I think you'd have to do an eigen decomposition of the similarity matrix to get some latent factors. But then the factors won't be non-negative. Am I understanding you correctly?

Out of curiosity, when you say you plotted the item similarity matrix and looked at the variances, do you mean you computed the variance of each column (or, equivalently, row) of the matrix, or did you do PCA or some other dimensionality reduction and looked at the resulting latent space factors of the items?

Best, Alice

User 888 | 11/6/2014, 1:47:28 AM

Hi Alice,

Thanks for the response.

Currently I am calculating a global score based on the similarity of my items somewhere outside of the Graphlab toolkit and I use it as a side feature in the NMF model. I wanted to use a small sample from my dataset and then plot the graph or the similarity matrix for the purpose of data exploration. Eigen decomposition would be a way to calculate another similarity score for my items to use as side feature.

About plotting the variance, I used a simple imshow to plot the W matrix of the NMF with 5 factors, and the most of the variance was represented in one of the factors the other factors had minute variance so I figured that might be due to reaching a local optima.

I believe everyone will appreciate it if you can link Graphlab to the sparse svd libraries.

Thanks again, Best,