User 2126 | 8/5/2015, 7:40:48 AM
I'm trying to measure how do my factorization model performs on recommending top-3 items for new unobserved users. The history of user-item interactions for these new users is known (but was not a part of training) and what I actually want to do is to compare last 3 actual user's interactions with the top-3 items recommended by a model. Schematically it looks like this:
history of interactions for new_user: item1, item2, item3, item4, item5 test dataset for new_user: item1, item2 evaluation dataset for new_user: item3, item4, item5 Task: compare top-3 recommendations produced for new_user based on observed items (item1, item2) with the real interactions (item3, item4, item5)
Documentation says that if newobservationdata parameter is provided (which in my case is the test dataset) than it's being taken into account for producing recommendations (instead of simply returning recommendations based on popularity). However, the results are seem to be too low. I start from this
model = gl.ranking_factorization_recommender.create(sf_train, num_factors=30, ranking_regularization=1, side_data_factorization=False)
Then I do precision-recall evaluation for my model by feeding into evaluation procedure the evaluation dataset with last 3 user-item interactions (sfeval) and the test dataset (sftest):
res = model.evaluate_precision_recall(sf_eval, cutoffs=, exclude_known=False, new_observation_data=sf_test, verbose=False)
After that I compute total number of correct predictions as follows:
res['precision_recall_by_user']['precision'].sum() * 3
which gives me number close to 0 (like 1 or 2, sometimes even 0). When using simple popularity model, it gives practically the same low result (1 correct prediction). I'm sure the number of correct predictions must be higher as with simple sparse SVD of rank 30 I can get around 20 correct predictions on the same data.
So probably, there's something wrong with computation of predictions on the unobserved data. I couldn't find out how this actually computed in the docs. Could you point me to detailed explanation of this or show me how to calculate prediction correctly on the unobserved data?