User 902 | 11/4/2014, 12:11:03 PM

Hello,
I was trying Graphlab Create item*similarity*recommender feature. Jaccard and cosine metrics work nice. But pearson for binary implicit input has strange behaviour. Next code, for instance, gets no results:
<code>

sf = graphlab.SFrame({'user*id': ['0', '0', '0', '1', '1', '2', '2', '2'], 'item*id': ['a', 'b', 'c', 'a', 'b', 'b', 'c', 'd']})
m = graphlab.item*similarity*recommender.create(sf, similarity*type='pearson')
m.get*similar_items(['a'])
</code>
It seems that all scores are equal to 0. Maybe I'm missing something here, but I think Pearson correlation between items 'a' ([1,1,0]) and 'c' ([1,0,1]) should be -0.5 in this example. Shouldn't it?

Regards

User 19 | 11/4/2014, 5:38:35 PM

The way we have implemented Pearson similarity is <a href="http://graphlab.com/products/create/docs/generated/graphlab.recommender.item*similarity*recommender.ItemSimilarityRecommender.html#graphlab.recommender.item*similarity*recommender.ItemSimilarityRecommender">documented here</a>. As you'll see, the mean rating is subtracted from each rating, and the sum is only over the users that items i and j have in common, U*ij. This means that for implicit data, all of the ratings are 1 and all of the mean ratings are 1, leaving a numerator of 0. We chose this definition to be consistent with <a href="http://en.wikipedia.org/wiki/Collaborative*filtering">Wikipedia's description</a>.

Thank you for getting in touch. Please feel free to ask more questions!

User 902 | 11/5/2014, 11:39:05 AM

Thanks for your quick answer!

I understand. Your Pearson similarity don't consider missing values as 0 and ignores them for calculus. I suppose this provides an efficient way to deal with very big sparse vectors. However I think this behaviour has 2 flaws: - It's not useful for implicit binary data - It doesn't consider all item/user vectors with the same dimension, and so it's not consistent with other Pearson correlation implementations like scipy.stats pearsonr or R cor

User 1768 | 12/21/2015, 6:16:48 PM

Hi I have an Sframe with n number of users and each user has a time-series vectors of values. I need to then for pair of users, to compute the correlation between their time-series vectors over the fixed time period. Their values are continuous. Is there this possibility in graph lab create to calculate the Pearson’s rho? As I know this package of pearson correlation in Graphlab create only support one feature value (not a vector for each user) and only categorical values. Is it correct? Please guide me, if there is this possibility. Thanks