Model Learning after deployment

User 2842 | 12/16/2015, 10:05:16 PM

Hello, I have a matrix factorization model deployed on aws for an online store and I need to train it with new data so it excludes already purchased products from recommendations for new users. Is there a way you can add new data to an existing model or do you have to train the model with all the data all over again after every order? It would be good if I could pass new observations and the model would save them in its database and use them to improve itself.

Thanks

Comments

User 1190 | 12/16/2015, 11:38:53 PM

Hi Alex,

If you just want to exclude some new observations, you could store the new observations into an SFrame, and pass it as the 'exclude' parameters to the recommend() function.

If you want to treat new users or new items more intelligently, the best way for now is to use a background model (has to be simple and fast to train) to serve new users or new items, and retrain the foreground model nightly. For example, I want to recommend new users the most popular items from the store since I have no observations about that user yet. Or I want to add the most popular items which are not included in the trained model to be take the third position in the final ranked list.

Best, -jay


User 2842 | 12/16/2015, 11:45:05 PM

Hi Jay,

Thanks for your reply.

If you train the new simple model only for the new users you won't get the analysis of historical data, so the model won't be very accurate I think. From what I find in the forum and documentation I think the best option is to save the purchased items on the client's end, then send that in the exclude field and re-train the whole model every night, or I could save the historical data as an Sframe on the server, then every hour have new orders appended to that sframe and queue the re-training of the model, then update the alias automatically to point to the new model once the training is done.


User 1190 | 12/17/2015, 6:12:55 PM

If all you want to do is to exclude items, then you are right: either keeping them on the client side, or server side as SFrame works. Use new model for new user or new items will be less accurate than if you had trained it with historical data, however, it could better than only using the historical data too since you don't know much about the new user anyway. It is better to split the cases: 1) when new user comes in, 2) when new item comes in and 3) when new order comes in. They should be treated very different and you can be very creative on how to handle 1) and 2). 3) is when you want to filter out purchased items, which we discussed first.