dealing with label imbalance in recommendation

User 3218 | 2/27/2016, 8:22:32 AM


I have a binary [thumbs-up/thumbs-down] recommender built using a factorization recommender (with side information). The distribution of thumbs-up vs. thumbs-down is 16:1. I have more than a 100M such labels. Given this skewed distribution, what are the best practices in Graphlab to deal with label imbalance problem for recommendation? I am aware of downsampling the dominant distribution using SFrame.sample(), but writing here to see if there is a better way to deal with this.



User 19 | 2/29/2016, 8:02:19 PM

Hi delip,

A ratio of 16:1 may be fine: the per-user recommendation performance will depend on the thumbs-up/thumbs-down ratio for each user, so you may want to consider looking into whether there are users who have a large imbalance. In those cases, I might trying to undersample the thumbs-ups for users who are quite imbalanced.

Cheers, Chris