question is related to graphlab collaborative filtering library.

User 945 | 11/14/2014, 7:24:23 AM

I was using collaborative filtering library. I ran wals for an implicit feedback dataset with around 2 million users. the problem i face now is if i try to generate top 10 recommendations using rating, then it approximately takes around 15-20 second to generate recommendations for a single user. going this way, it might take many months to run for all 2 million users. can you suggest how this can be made to work faster.

Comments

User 6 | 11/14/2014, 8:02:18 AM

I suggest switching to GraphLab Create were we have an efficient way of computing top K recommendations per user. We are going to add weighted ALS support very soon.


User 945 | 11/14/2014, 8:36:26 AM

thanks @DannyBickson‌ but is there anything i can try with the old graphlab so that i can get something working until WALS is added to graphlab create.


User 6 | 11/14/2014, 3:18:05 PM

We have way improved algorithms now. Some examples are 1) we support side features - so if you have additional information abut the users like their age, zip code etc. , additional info about the items like weight, color, price etc. and additional information about the rating like time, day of week, etc. we can incorporate all this information and improve the quality of the recommendation. 2) we support cold start - if you have no purchase information about the user or item but still relevant side features we can use this information to improve the predictions. 3) We also support ranking and implicit ratings.

One the WALS will be release we will notify in this forum - please be patient ...


User 19 | 11/14/2014, 4:27:24 PM

In the meantime, I also recommend that you try out <a href="http://graphlab.com/products/create/docs/generated/graphlab.recommender.rankingfactorizationrecommender.create.html#graphlab.recommender.rankingfactorizationrecommender.create">rankingfactorizationrecommender</a>.


User 945 | 11/19/2014, 11:03:05 AM

@DannyBickson‌ I just had one more doubt. thats about the input format for WALS in graphlab. I create an input file with each row in following format : userid productid weight rating.

how do I understand weight and rating that I enter in input file to affect the cost function.

is it like ith row contributes the following term to cost function weighti*(innerproduct(Xi,Yi) - ratingi)^2 + regularization-term

i.e. if ith row is : ui pi wi ri and Xi and Yi are latent factors for ui and pi, the term added by this row to cost function is : wi*(innerproduct(Xi,Yi) - ri)^2 + regularization.

I just wanted to be sure if this is how the input file is interpreted internally in graphlab for WALS.

thanks in advance.


User 89 | 12/1/2014, 8:14:08 PM

Hello girish,

I'm just following up with this after our recent release of graphlab create 1.1. With that release, we've implemented an implicit ALS solver for the recommender system which mimics the weighted ALS version that's implemented in GraphChi.

How are you intending to use the weight term in the equations above? The version of implicit ALS in GLC 1.1, described in <a href="labs.yahoo.com/files/HuKorenVolinsky-ICDM08.pdf">Collaborative Filtering for Implicit Feedback Datasets</a>, tries to separate the observed ratings from the unobserved observations, and then treats the ratings in the given set of observations as "weights" on the positive examples. Thus you still only have user, items, and ratings, but the objective is to distinguish between rated and unrated items. This tends to give very solid results in the recommender system context.

Would that formulation work for you?

-- Hoyt


User 945 | 12/2/2014, 6:00:20 AM

@Hoyt‌ thanks hoyt. implicit als should work. the documentation does not state that for ials, ratings are treated as weights for positive samples. i still have one more doubt. in the paper "collaborative filtering for implicit data sets", the unrated pairs are also included in the error term with a small confidence value. does graphlab create do exactly the same with ials?


User 1025 | 12/10/2014, 11:20:14 PM

@DannyBickson‌, You mention the ability to include "additional information about the rating like time." Could you please point me to the API docs or an example where that is explained? I can't find that info within the documentation. Thanks in advance.


User 6 | 12/12/2014, 6:33:56 AM

Hi, please find attached a notebook example which shows the usage of side features to improve accuracy of predicting flight time. You will need to change the suffix from txt => ipynb and open it in ipython. Let us know if you have any additional questions.