popularity_recommender, observational data vs side data

User 2568 | 5/10/2016, 1:59:14 AM

I've been experimenting using the popularity_recommender to predict hotel selection using the Expedia log data. I found that adding user and item side data doubles the validation score (MAP@5), which is probably expected.

What I don't get is why popularityrecommender doesn't support observational data. My understanding is user data is data specific to the user like their country or city, and item data is specific data for the item, e.ge., hotel country etc. Then there is observational data beyond the userid that specific to the log entry, e.g., day of week or hour etc, device attributes or search parameters like destination.

My question is, why doesn't popularity_recommender support this additional observation data, is it not appropriate or is it just not implemented.

My thought is I probably need to look at using rankingfactorizationrecommender, however I'm still working up to that.

Comments

User 1207 | 5/11/2016, 6:52:18 PM

Hello Kevin,

Currently, the popularity recommender just uses the global popularity measures, but adding features to handle your use case is on our immediate roadmap, so expect some new stuff coming out soon. Thank you for the suggestions!

-- Hoyt


User 2568 | 5/11/2016, 11:32:56 PM

Can you explain how popularity recommender uses 1. Aditional observational data, i.e., not user, item or target? 2. user and item side data?

These are not well covered in any of the documentation I can find.

BTW, in the Expedia Kaggle comp, popularity recommender seems to give better results than the other recommenders.


User 1207 | 5/12/2016, 6:42:32 PM

@Kevin_McIsaac,

Thanks for the thoughts. Right now, the popularity recommender is really simple. It simply either averages the ratings over all user ratings, or it counts the occurrences of items in the dataset if there are no ratings. It doesn't use any side data; essentially, it's meant simply as a baseline to know how much other models are improving over it. However, it is on our road map to add the exact features you've mentioned to our road map, so this will hopefully improve greatly soon.

If you have specific needs or references for us in implementing it, it'd be helpful, and we'd definitely tak ethat into account, as we want to have the best recommenders possible :-).

Thanks! -- Hoyt


User 2568 | 5/13/2016, 3:40:40 AM

@hoytak Got it.

The documentation on PopularityRecommender spells this out more clearly than [popularityrecommender.create](https://dato.com/products/create/docs/generated/graphlab.recommender.popularityrecommender.create.html#graphlab.recommender.popularity_recommender.create). Indeed after reading the documentation for [popularityrecommender.create](https://dato.com/products/create/docs/generated/graphlab.recommender.popularityrecommender.create.html#graphlab.recommender.popularity_recommender.create) I thought it used user and item side data and I wasted some time on that.

To avoid all doubt, may I suggest that: 1. You add some of the text from the popularityrecommender docs to the popularityrecommender.create docs, ie.,

items are scored by the number of times they is seen in the training set or on the average of the target. Hence scores are the same for all users and recommendations are not tailored to individuals. Popularity Recommender is simple, fast, provides a reasonable baseline and can work well when observation data is sparse.

  1. You borrow the following from itemsimilarityrecommender.create to clarify about side data

    Notes Currently, popularityrecommender does not leverage the use of side features userdata and item_data.


User 1207 | 5/16/2016, 8:04:33 PM

@Kevin_McIsaac,

Thanks for the feedback. We'll fix these issues. Thank you for all your contributions!

-- Hoyt