Recommender System for Products. Review - User,Item,Rating, Categories.

User 1893 | 5/6/2015, 10:28:12 PM

Im given a file where each row corresponds to a review. Each row has User,Item,Rating,Categories. (IMPORTANT - The number of items is very low, but almost each User has reviewed only 1-2 items)

Im also given a test file, where each row is a User-Item pair. I have to predict whether a User will review/buy the specified Item. (1 if he will, 0 otherwise)

For this I made a recommender (automatically chose a Ranking Factorization model), and gave the item_data the categories for each item. Then i try to predict the rating for each User-Item pair, and if the rating >= 3 or something, I say he will but it.

It seems that the rating is very high for totally non-related items as well. How do I fix this?


User 1207 | 5/7/2015, 4:53:14 PM


Having only a few ratings per user for the ranking factorization model might be challenging for this model, as it has to learn a number of parameters associated with each user (the weight and a latent factor). If it doesn't have enough data to learn the latent factors associated with each user, then it will predict fairly random results on users/items unrelated to the data it's seen. However, if you increase the regularization value, you can control this behavior somewhat. Also, if you decrease the number of associated factors, that is a significant help as well. I'd suggest trying numfactors = 4, regularization = 0.001, and rankingregularization = 0.25, then increasing/decreasing them until you see the results behaving badly. It may take a bit of playing around to get this right, especially in your case where you have less data.

In your case, a better model is likely the item similarity model. You wouldn't input the category information for the items, but you may still find that the recommender will be able to learn a great model with just the user-item interactions.

User 1893 | 5/10/2015, 9:31:13 PM

Hi Hoytak!

Thanks for the quick response! I changed my model to ItemSimilarity, and it seems to bet better for me. However, precision and recall are still very low. (o.oo4, 0.008 ...)

Basically what Im doing im getting the top 20,000 recommended items for the particular use, and if the item from the user-item pair is in this list, i say he would buy it. Its still not giving me the best results....

I have 1 Million reviews if that helps...,about 800,000 Unique Users, and about 225,000 Items.

User 1207 | 5/14/2015, 11:56:43 PM

Hello Maddymanu,

That's not much data for that many users. It's possible that your model is actually doing reasonably well. If you want to email us directly, we would be happy to take a look at your data to see if there are ways to improve it.

As for seeing if a user buys a particular item, it is likely that the predict() method of a recommender model will give you a better result. This recommends a score, you can then threshold this score to get a yes/no result. Some tuning may be needed to find a good score, but you may find it productive. Choose ones with a score > a constant as more likely to buy, and ones with a score less than that to not buy.

Hope that helps! -- Hoyt

User 4931 | 4/17/2016, 1:18:44 AM

Hello @hoytak k, and @maddymanu ,

I have the same task. I have 1 million review data with 0.5 million unique user and 0.5 million unique item For an user -item pair i want to predict whether user will buy/review this specific item. Using item Similarity model, after getting 20000 product for each user checking this specific product lies in this recommended products then its 1 else 0. What was the success rate for this model ? @maddymanu Any new approach to tackle this problem will be really helpful. Thank you.

User 1207 | 4/29/2016, 8:45:08 PM

Hello @vsanghvi007 -- sorry to not get back to you earlier, I somehow missed seeing your question.

I'm not exactly sure what you mean by saying that you have 20000 products for each user? Do you mean that you ask for 20000 recommendations for each user, then use that as the score? If you use the predict method instead of the recommend method, you get the specific score for a given user/item pair, which seems to be what you are looking for.

Hope that helps! -- Hoyt