That's interesting! We haven't observed that particular behavior with NMF+SGD. But we'll play with it. Until now we haven't seen a compelling reason to bring in ALS. But this could be that reason. Thanks for the tip.
Using the item similarity matrix as initialization for the factorization machine is an interesting idea. You can retrieve the ranked similarity table from itemsimilarityrecommender using <a href="http://graphlab.com/products/create/docs/generated/graphlab.recommender.itemsimilarityrecommender.ItemSimilarityRecommender.getsimilaritems.html#graphlab.recommender.itemsimilarityrecommender.ItemSimilarityRecommender.getsimilaritems>getsimilaritems</a>. (This will be expensive if there are a large number of items.) You can then feed it into the factorization recommender as side items. A note of caution: factorization machines can be finicky to tune because they have way too many degrees of freedom. It is also quirky in that it can have trouble absorbing numeric side features, so we recommend binning real-valued numeric features and converting the bin numbers to string type, which will then be treated as categorical variables. I would also start with just matrix factorization instead of full factorization machine. In GLC you can get MF by setting
side_data_factorization=False in factorization_recommender.create().
The alternative you mentioned--using them as initial values for item factors--is less viable in the current version of GLC since we don't yet allow for hard-coded initial values for factorization machine. Aside from that. I'm also not sure how the proposal would work. Item similarity gives you the full similarity matrix (or the top k similar items for each item). This is not the same as the latent vector representation in factorization machines. I think you'd have to do an eigen decomposition of the similarity matrix to get some latent factors. But then the factors won't be non-negative. Am I understanding you correctly?
Out of curiosity, when you say you plotted the item similarity matrix and looked at the variances, do you mean you computed the variance of each column (or, equivalently, row) of the matrix, or did you do PCA or some other dimensionality reduction and looked at the resulting latent space factors of the items?