When side information is used, the calculation is a bit more complicated, but it's possible that other things are going on. Are there additional item ids in your side data that are not present in your main data? If so, the algorithm ends up scoring more items to make a recommendation than with the side data not present. Filtering out unneeded items from your side information would help the recommendation.
As for the evaluate question, the
predict() method is doing something different than the model's
evaluate() method, as evaluation for recommender quality is based on which items are recommender.
predict() scores a specific user-item interaction, whereas the
model.evaluate() method typically uses the model's recommend method. The model's recommend method scores every item for each user and then returns the top scoring items; this is then used to evaluate the quality of the recommendations. The
gl.evaluation tool scores how accurate the scores are for individual user-item pairs, but does not look at the precision/recall scores in this case.
Hope that helps!