Implicit Feedback in Collaborative Filtering

User 794 | 10/2/2014, 3:07:59 AM

First off, I'd like to say: GraphLab Create is wicked! I love the SFrame, I love the training speed of the ML algorithms, and I love the quick and easy visualization using the .show() function.

That being said, I'm curious if Implicit Feedback CF algorithms are implemented in GraphLab Create (like in GraphLab)? Is there a training option in GraphLab Create that allows for Implicit Feedback? Or should I stick with GraphLab if I need to train a Implicit Feedback CF model.

Thank you for the time, and thanks for making such an outstanding tool available to the python community!


User 19 | 10/2/2014, 4:33:51 AM

Thanks for the feedback! Yes, GraphLab Create has a few algorithms appropriate for situations where you only have implicit feedback. In the <a href="">recommender</a> toolkit, implicit feedback algorithms are used whenever you don't provide a target column. You may be interested in <code class="CodeInline">method="itemsimilarity"</code> and <code class="CodeInline">similaritytype="jaccard"</code> or <code class="CodeInline">similarity_type="cosine"</code>.

For more details, be sure to check out our recent blog post on <a href="">choosing a recommender</a> as well as our <a href="">user guide</a>. If you have any trouble, don't hesitate with asking us questions!

User 18 | 10/2/2014, 6:09:14 AM

We are also planning to add more algorithms that deal with implicit feedback. We're evaluating different options. If you have a favorite one, we'd love to hear about it!

User 794 | 10/15/2014, 8:36:24 PM

Thank's for the replies Chris and Alicez! Sorry for the late response.

@Chris, thanks for the article, it was very insightful. Item_similarity is useful, but one limitation is it does not take into account the number of times a user,item pair occurs in data (i.e., user X watched movie Y k times).

Furthermore, the method of 'bucketizing' the frequency of (user,item) pairs is less than ideal in the matrix factorization setting since it adds additional parameters to the 'bucketize' function.

@Alicez, what I had in mind for implicit feedback datasets was more in line with the model described in this paper:

In this model, matrix factorization is applied to binary indicators for item, user pairs with an additional 'confidence' weight multiplying the mean, squared error terms in the objective function.

Basically, it's standard matrix factorization on binary targets with additional weights added to the objective function proportional to the number of times a user, item pair is observed.

Reading the release notes for 1.0: "Added new recommender models for implicit data". Is there any where I can take a look at the models that were added in this release?


User 89 | 10/16/2014, 8:20:26 PM

Hello MiroslawH,

Implementing that specific algorithm is on our roadmap, but it's not in this release. The one we do have implemented is similar to BPR, and it seems to work quite well on the datasets we have tested it on.

In this case, when there are no target values, we use logistic loss to fit a model that attempts to predict all the given (user, item) pairs in the training data as 1 and all others as 0. To train this model, we sample an unobserved item along with each observed (user, item) pair, using SGD to push the score of the observed pair towards 1 and the unobserved pair towards 0.

To choose the unobserved pair complementing a given observation, the algorithm selects several (defaults to four) candidate negative items that the user in the given observation has not rated. The algorithm scores each one using the current model, then chooses the item with the largest predicted score. This adaptive sampling strategy provides faster convergence than just sampling a single negative item.

We tried out several other pairwise sgd algorithms for factorizing implicit data, and this approach was the only one that consistently performed well. I'm curious what you think of this approach -- please let me know if it works well for you!

Thanks! -- Hoyt

User 794 | 10/28/2014, 3:03:23 PM

Thanks for the information Hoyt!

I have had a chance to try out the approach you described, but I seem to consistently get poorer results than the ItemSimilarity model (not by much, ~0.1% difference in recall). This may be due to the nature of my dataset which is very noisy.

I'm curious how the implementation for the above algorithm works. Would it be possible to mimic a user listening to the same song multiple times by including the (user, item) pair multiple times in the training data? Or does the algorithm drop duplicates prior to training?

Thanks -M

User 1137 | 5/2/2016, 4:45:55 PM

Hi @Hoyt,

Does GraphLab Create have an implementation of Factorization Machine for implicit feedback datasets?

In the documentation, it says by specifying linear_side_features=True, the ranking_factorization_recommender will use factorization machine as the underlying model (I do have side information in my dataset, so this is not a problem). But does it also work for positive only feedback (instead of ratings)?

Thanks, chunguo

User 19 | 5/2/2016, 4:50:15 PM

Hi chunguo,

Yes, the ranking_factorization_recommender can also work on implicit feedback datasets. This is done automatically for cases where no target argument is provided.

Cheers, Chris

User 1137 | 5/2/2016, 6:31:03 PM

Awesome! Thank you Chris!