evaluating recommenders

User 5273 | 6/8/2016, 6:11:09 PM

hi, All the recommenders world is new for me so please bare with me and help me with my question. In the classic machine learning algorithms we have examples X with Y the tags of the examples. In order to evaluate a prediction (hypothesize) function f we compare Y[i] with f(X[i]). However, in recommendation algorithm we don't have Y tags and only fill missing ratings per items and users. What is compared in the test examples that let us evaluate the algorithm? and how does it done?

P.S.-I have googled the question and read the docs that I found on the provided evaluating functions (evaluate.rmse, evaluateprecisionrecall).

I really appreciate any help in this matter. Thanks!


User 1207 | 6/9/2016, 7:44:35 PM

Hello @jony,

There are two ways we can see a recommender system. The RMSE case comes in to play when you try to predict what rating a user will give a particular item. In that case, it's exactly as you describe.

The second, and far more common case, is that you want to recommender items that the user has rated highly. To set this up for evaluation, we generate a test set consisting of a random subset of items from a randomly selected subset of users.

Now, there is a Y in this context, but it's implicit. Suppose that instead of just working with the list of user-item interactions, we looked at all possible user/item pairs, and gave each pair a label of 1 if the user had interacted with that item, and 0 if the user had not interacted with it. In other words, instead of

user1, item1 user1, item3 user2, item2 user2, item3

you would have something like

user1, item1, 1 user1, item2, 0 user1, item3, 1 user1, item1, 0 user1, item2, 1 user1, item3, 1

Then this fits the X, Y framework.

Now, metrics like precision-recall work with sets, and in this case it looks at the set of rated items for each user in the test set. So it's a different metric, but it basically looks at how accurate your predictions for 1 and 0 were for each item in the test set.

Hope that helps! -- Hoyt