factorization model - too low precision-recall results for unobserved users

User 2126 | 8/5/2015, 7:40:48 AM


I'm trying to measure how do my factorization model performs on recommending top-3 items for new unobserved users. The history of user-item interactions for these new users is known (but was not a part of training) and what I actually want to do is to compare last 3 actual user's interactions with the top-3 items recommended by a model. Schematically it looks like this:

history of interactions for new_user: item1, item2, item3, item4, item5 test dataset for new_user: item1, item2 evaluation dataset for new_user: item3, item4, item5 Task: compare top-3 recommendations produced for new_user based on observed items (item1, item2) with the real interactions (item3, item4, item5)

Documentation says that if newobservationdata parameter is provided (which in my case is the test dataset) than it's being taken into account for producing recommendations (instead of simply returning recommendations based on popularity). However, the results are seem to be too low. I start from this

model = gl.ranking_factorization_recommender.create(sf_train, num_factors=30, ranking_regularization=1, side_data_factorization=False)

Then I do precision-recall evaluation for my model by feeding into evaluation procedure the evaluation dataset with last 3 user-item interactions (sfeval) and the test dataset (sftest):

res = model.evaluate_precision_recall(sf_eval, cutoffs=[3], exclude_known=False, new_observation_data=sf_test, verbose=False)

After that I compute total number of correct predictions as follows:

res['precision_recall_by_user']['precision'].sum() * 3

which gives me number close to 0 (like 1 or 2, sometimes even 0). When using simple popularity model, it gives practically the same low result (1 correct prediction). I'm sure the number of correct predictions must be higher as with simple sparse SVD of rank 30 I can get around 20 correct predictions on the same data.

So probably, there's something wrong with computation of predictions on the unobserved data. I couldn't find out how this actually computed in the docs. Could you point me to detailed explanation of this or show me how to calculate prediction correctly on the unobserved data?

Thanks, Evgeny


User 1207 | 8/6/2015, 9:03:23 PM

Hello Evgeny,

Currently, the new users and new observation vectors in the factorization recommender simply use a global default; unfortunately, it does not currently rebuild part of the model or choose the user latent factors intelligently based on new data. Adding that functionality in is something we are currently working on, but there are a few subtle issues to ensure it's done well in our models. We hope this will be a seamless feature in our recommender soon.

The item_similarity model makes the best use of new user data and new observation data, and it performs approximately as well as a model trained with that new data included. If that is an option for you, then I'd recommend using that model.

A more sophisticated option that some of our users have done and that works quite well in this context is to train both an itemsimilarity model and a factorization model, then use the itemsimilarity model to generate candidates for the factorization model when there are multiple users. The output of recommend() in one model can be directly passed in to the "items" argument of another recommend call to restrict the set of candidate items generated.

-- Hoyt