User 3230 | 3/1/2016, 9:58:28 AM
After experimenting with the evaluation outputs, I realize that the metrics of precision and recall do not really match the "true" metrics.
For example, user X has only 6 items, and is part of the test set. 50% (3) of his items are withdrawn.
For cut-off 1 and 2 these do not really reflect the true picture as the "recommender" has not been given sufficient chance to make 3 correct guesses. Assuming at K= 10, all 3 items are guessed, the precision will taper off as there are "no more correct answers" to guess.
An example output from Graphlab's evaluate function:
What I did instead is to calculate the TP,FP, TN, FN from scratch, assuming that the maximum guesses is the "total catalogue size" I have done a new cutoff which is "number of guesses - number holdout items). My ROC curve can then be plotted based on cutoffs bigger or equal to zero.
1) Is the solver in the ranking factorization engine taking care of this? I am not sure at the current moment whether it is truly minimizing the right values or not.
2) Is there a way I can implement this modified TP/FP/TN/FN into Graphlabs to obtain the correct precision and recall metrics?