Back Testing

User 2488 | 10/26/2015, 5:26:41 PM

My goal is to create a course recommendations engine. My current approach involves first splitting all the historical data to which I have access into a training and testing set. My question involves practical implementation of this split. I see that graphlab has a number of utility functions that might help me with this (notably, SFrame.randomsplit and recommender.util.randomsplitbyuser).

My confusion comes from thinking about how my recommendations system should intuitively operate. I think that it should work as follows: a user comes to my system today, recommendations are made for that user based on data we have up to today. The idea, then, is something along the lines of a past/present+future split. As such, I think that it makes sense to use back-testing to evaluate the models I end up choosing (after hyperparameter tuning, of course). I would use something like the past 3 months for my testing split, and the 12 months previous (for a total of 15 months of historical data) for my training split. Of course, the same user may appear in both sets, and the same (user, object) pairing may appear in both sets.

Does anyone know of how I could create these splits with graphlab. If not, I am wondering how I should code up these splits in a way that would stay consistent with graphlab's create() and evaluate() methods. Should users belong in only one set or the other? What about (user, object) tuples?

Any ideas would be greatly appreciated. Thanks so much, everyone!

Comments

User 4 | 10/26/2015, 9:38:03 PM

Hi @chusteven, you can split the data on a datetime value using a logical filter. If you have a datetime column named "date", you could do the following (supposing you want to split on the date 12/10/2014):

train = sf[sf['date'] < datetime(12, 10, 2014)] test = sf[sf['date'] > datetime(12, 10, 2014)]

I think ideally you would want at least some users to be in both sets (train and test) -- otherwise you are only going to be testing on users you have never seen before, and you will probably not get relatively good results (but if your real use case is to optimize for users you've never seen before either, this may be a good real-world test). I think your train/test split should be realistic in that you will be attempting to test (evaluate the performance of the model) in a situation similar to where you will actually use the model to make predictions. If most users are new going forward (where the model will be used), it makes sense to also evaluate the model's performance using a test set where most users are new.


User 2488 | 11/3/2015, 11:59:01 PM

Thanks, @Zach! After more thoroughly reading the documentation on your randomsplitbyuser method, I figured that what I was really looking to do was slightly re-tool that particular utility function for my own needs. That is, I coded my own version that didn't take a random subset of my test users' items, but rather the most recent_ 30% (or whatever) of my test users' items, and then store those (user, item) pairs in my test set.

There is some of the logic you mentioned in your response above in my version, for sure. So thanks for that idea.