Discrepancy regarding 'confusion_matrix'

User 761 | 12/8/2014, 7:12:36 AM

Hello guys!

The confusion matrix obtained by 'xxx.evaluate' is different from the confusion matrix obtained from 'evaluation.confusionmatrix(targets,predictions)' Concretely, the one generated by '.evalutate' is wrong while the one generated by 'evaluation.confusionmatrix()' seems to be correct Consider the dummy example below: The two confusionmatrices are different. The 'targetlabel' and 'predicted_label' columns appear to be interchanged in the '.evalutate' version Have I made a mistake in using any of the functions or is this a bug?


from graph lab import * from random import random,randint sfnew=SFrame() sfnew['data']=[random() for i in xrange(0,100)] sfnew['labels']=[randint(0,1) for i in xrange(0,100)] clfnew=boostedtreesclassifier.create(sfnew,target='labels') print clfnew.evaluate(sf_new)

PROGRESS: Boosted trees classifier:

Number of examples : 100 (45 positives, 55 negatives) Number of feature columns : 1 Number of unpacked features : 1 PROGRESS: Starting Boosted Trees PROGRESS: -------------------------------------------------------- PROGRESS: Iter Accuracy Elapsed time PROGRESS: 0 7.500e-01 0.01s PROGRESS: 1 8.200e-01 0.01s PROGRESS: 2 8.400e-01 0.01s PROGRESS: 3 8.400e-01 0.01s PROGRESS: 4 8.600e-01 0.01s PROGRESS: 5 8.400e-01 0.01s PROGRESS: 6 8.600e-01 0.01s PROGRESS: 7 8.700e-01 0.01s PROGRESS: 8 8.600e-01 0.01s PROGRESS: 9 8.700e-01 0.02s {'confusionmatrix': Columns: targetlabel str predicted_label str count int

Rows: 4

Data: +--------------+-----------------+-------+ | targetlabel | predictedlabel | count | +--------------+-----------------+-------+ | 0 | 0 | 51 | | 0 | 1 | 9 | | 1 | 0 | 4 | | 1 | 1 | 36 | +--------------+-----------------+-------+ [4 rows x 3 columns] , 'accuracy': 0.87}

targets=sfnew['labels'] predictions=clfnew.classify(sfnew)['class'] print evaluation.confusionmatrix(targets,predictions)

+--------------+-----------------+-------+ | targetlabel | predictedlabel | count | +--------------+-----------------+-------+ | 0 | 0 | 51 | | 0 | 1 | 4 | | 1 | 0 | 9 | | 1 | 1 | 36 | +--------------+-----------------+-------+ [4 rows x 3 columns]


Comments

User 91 | 12/8/2014, 8:07:02 AM

Thank you for finding this issue. I believe you are right. It is a bug with boosted trees in the 1.1 release. It will be fixed in the next release. For now, you can use the GraphLab evaluation module.

We will keep you posted. Sorry for the bother!