FactorizationRecommenders with side data are really slow in evaluation

User 3218 | 3/1/2016, 3:16:58 PM

Right now I'm evaluating a model that's processing around 7 users per sec. To give context, prior to incorporating side data, on the same environment evaluate() would process around 100 users per sec. The user and item side data are quite simple: (id, single-categorical-value), although there is a lot of them.

I'm already running this on a c3.8xlarge and I have set my GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS to 32. Is there anything I can do to speedup?



User 3218 | 3/1/2016, 7:32:06 PM

Follow up: Instead of the model.evaluate() method, if I used model.predict() and manually evaluate the results using the gl.evaluation package, the entire dataset gets processed in a few seconds. Something definitely weird is going on in model.evaluate().

User 1207 | 3/1/2016, 9:29:44 PM

Hello Delip,

When side information is used, the calculation is a bit more complicated, but it's possible that other things are going on. Are there additional item ids in your side data that are not present in your main data? If so, the algorithm ends up scoring more items to make a recommendation than with the side data not present. Filtering out unneeded items from your side information would help the recommendation.

As for the evaluate question, the predict() method is doing something different than the model's evaluate() method, as evaluation for recommender quality is based on which items are recommender. predict() scores a specific user-item interaction, whereas the model.evaluate() method typically uses the model's recommend method. The model's recommend method scores every item for each user and then returns the top scoring items; this is then used to evaluate the quality of the recommendations. The gl.evaluation tool scores how accurate the scores are for individual user-item pairs, but does not look at the precision/recall scores in this case.

Hope that helps! -- Hoyt

User 3218 | 3/1/2016, 10:30:02 PM

@hoytak thanks for that very clear explanation. So, let's say if I have a training and validation set, and I am building a model using the training set. I have two ways of computing RMSE: 1) by calling model.evaluate(validation_set), and 2) by calling model.predict(validation_set) and then use gl.evaluation.rmse() on the predictions. From your explanation, I cannot clearly see which of the two is better for model selection. Clearly, both can be used. One is fast, and another is slow. But which of these is more robust, and why?

User 1207 | 3/2/2016, 7:15:42 PM

Hey Delip,

The difference is that model.evaluate also computes the precision and recall, which is more expensive to evaluate. If you call model.evaluate_rmse(), you should see comparable performance, and if you call model.evaluate_precision_recall(), it should be substantially slower as it's much more expensive, especially if you have a lot of items.

Does that explain the difference?

Thanks! -- Hoyt

User 3218 | 3/3/2016, 12:53:00 AM

I see. That makes sense. Thanks again for the explanation. For regression problems (i.e. real-valued target), you are better off calling evaluate_rmse() instead of evaluate().