add 'precision_recall_curve' metric to gl.boosted_trees_classifier.BoostedTreesClassifier.evaluate()

User 2785 | 3/28/2016, 10:30:16 PM

The evaluate method for the churnpredictor model has a 'precisionrecall_curve' metric option which makes it easy to plot the precision recall curve. The boosted trees evaluation method doesn't have this- hoping it an be added sometime soon!

But as it's not an option yet- is there another way I can get equivalent output for a boosted trees model?

example output from https://dato.com/learn/gallery/notebooks/customer-churn-prediction.html:

` 'precisionrecallcurve': Columns: cutoffs float precision float recall float

Rows: 5

Data: +---------+----------------+-----------------+ | cutoffs | precision | recall | +---------+----------------+-----------------+ | 0.1 | 0.707317073171 | 0.996183206107 | | 0.25 | 0.72268907563 | 0.984732824427 | | 0.5 | 0.751515151515 | 0.946564885496 | | 0.75 | 0.80612244898 | 0.603053435115 | | 0.9 | 0.882352941176 | 0.0572519083969 | +---------+----------------+-----------------+ [5 rows x 3 columns] } `

Comments

User 2917 | 3/29/2016, 6:07:51 PM

Hello,

Thanks for the feedback, I'll share your feature request with the team.

You can compute the precisionrecallcurve yourself without too much difficulty using the function below. This takes as input an SArray of ground truth labels, an SArray of predicted probabilities, and an array of probability thresholds:

`python import graphlab as gl

Compute precision and recall data for given

ground truth labels, predicted probabilities,

and probability cutoffs.

def precisionrecallcurve(labels, probabilities, cutoffs):

precision = [gl.evaluation.precision(labels, probabilities > cutoff)
             for cutoff in cutoffs]

recall = [gl.evaluation.recall(labels, probabilities > cutoff)
          for cutoff in cutoffs]

return gl.SFrame({
    'cutoffs': cutoffs,
    'precision': precision,
    'recall': recall,
})

`

If you have test_data and predictions as defined in the Gradient Boosted Trees classifier example in the user guide, you can use the following code to test this function:

`python

Define the probability thresholds for calculating precision and recall

cutoffs = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

precisionrecallcurve(test_data['label'], predictions['probability'], cutoffs) `

Let me know if you have any questions about this!


User 2785 | 3/29/2016, 9:21:57 PM

this is great, thanks so much!!