User 2488 | 10/26/2015, 5:26:41 PM
My goal is to create a course recommendations engine. My current approach involves first splitting all the historical data to which I have access into a training and testing set. My question involves practical implementation of this split. I see that graphlab has a number of utility functions that might help me with this (notably, SFrame.randomsplit and recommender.util.randomsplitbyuser).
My confusion comes from thinking about how my recommendations system should intuitively operate. I think that it should work as follows: a user comes to my system today, recommendations are made for that user based on data we have up to today. The idea, then, is something along the lines of a past/present+future split. As such, I think that it makes sense to use back-testing to evaluate the models I end up choosing (after hyperparameter tuning, of course). I would use something like the past 3 months for my testing split, and the 12 months previous (for a total of 15 months of historical data) for my training split. Of course, the same user may appear in both sets, and the same (user, object) pairing may appear in both sets.
Does anyone know of how I could create these splits with graphlab. If not, I am wondering how I should code up these splits in a way that would stay consistent with graphlab's create() and evaluate() methods. Should users belong in only one set or the other? What about (user, object) tuples?
Any ideas would be greatly appreciated. Thanks so much, everyone!