Boosted decision tree output indices

User 2356 | 10/9/2015, 9:50:38 AM

On extracting tree feature indices I get the following output, the model has 53 classes but I see many indices to have values beyond 53, so what these indices mean and how can I get programmatic access to the decision tree nodes from model.show function? ` data['boostedtreefeatures']=model.extract_features(data)

data['boosted_tree_features'][0]

array('d', [20.0, 24.0, 65.0, 66.0, 20.0, 20.0, 16.0, 81.0, 112.0, 82.0, 64.0, 90.0, 62.0, 68.0, 32.0, 12.0, 12.0, 20.0, 66.0, 28.0, 30.0, 26.0, 12.0, 26.0, 14.0, 38.0, 18.0, 14.0, 32.0, 16.0, 30.0, 24.0, 30.0, 110.0, 28.0, 28.0, 12.0, 20.0, 12.0, 18.0, 20.0, 26.0, 24.0, 12.0, 14.0, 30.0, 48.0, 14.0, 14.0, 36.0, 44.0, 30.0, 2.0, 30.0, 30.0, 86.0, 67.0, 18.0, 18.0, 14.0, 87.0, 81.0, 82.0, 50.0, 66.0, 70.0, 56.0, 38.0, 2.0, 20.0, 40.0, 64.0, 26.0, 28.0, 34.0, 16.0, 30.0, 18.0, 40.0, 22.0, 20.0, 50.0, 18.0, 36.0, 22.0, 32.0, 66.0, 60.0, 36.0, 12.0, 18.0, 12.0, 20.0, 48.0, 22.0, 26.0, 18.0, 14.0, 34.0, 61.0, 26.0, 14.0, 28.0, 44.0, 26.0, 2.0, 26.0, 32.0, 69.0, 58.0, 18.0, 20.0, 14.0, 110.0, 68.0, 94.0, 54.0, 64.0, 60.0, 72.0, 40.0, 2.0, 22.0, 32.0, 80.0, 32.0, 30.0, 34.0, 18.0, 32.0, 20.0, 40.0, 20.0, 28.0, 56.0, 22.0, 46.0, 32.0, 30.0, 58.0, 38.0, 40.0, 12.0, 18.0, 12.0, 18.0, 56.0, 26.0, 28.0, 16.0, 16.0, 38.0, 50.0, 24.0, 12.0, 40.0, 42.0, 30.0, 2.0, 28.0, 30.0, 62.0, 66.0, 18.0, 30.0, 18.0, 73.0, 75.0, 64.0, 76.0, 67.0, 58.0, 68.0, 44.0, 2.0, 22.0, 28.0, 66.0, 36.0, 30.0, 34.0, 18.0, 30.0, 20.0, 36.0, 24.0, 36.0, 54.0, 22.0, 38.0, 32.0, 48.0, 60.0, 42.0, 34.0, 12.0, 22.0, 12.0, 20.0, 56.0, 28.0, 34.0, 16.0, 20.0, 30.0, 72.0, 24.0, 12.0, 32.0, 50.0, 24.0, 2.0, 30.0, 36.0, 54.0, 53.0, 20.0, 32.0, 18.0, 105.0, 71.0, 76.0, 78.0, 56.0, 74.0, 48.0, 48.0, 1.0, 18.0, 26.0, 56.0, 46.0, 24.0, 30.0, 18.0, 34.0, 24.0, 46.0, 20.0, 36.0, 46.0, 22.0, 50.0, 32.0, 48.0, 54.0, 56.0, 56.0, 16.0, 56.0, 12.0, 20.0, 48.0, 28.0, 36.0, 18.0, 20.0, 42.0, 50.0, 2.0, 12.0, 32.0, 52.0, 28.0, 2.0, 7.0, 38.0, 52.0, 42.0, 18.0, 34.0, 20.0, 117.0, 81.0, 64.0, 62.0, 47.0, 56.0, 52.0, 58.0, 2.0, 18.0, 34.0, 82.0, 38.0, 22.0, 32.0, 20.0, 42.0, 20.0, 36.0, 20.0, 32.0, 56.0, 24.0, 38.0, 30.0, 42.0, 45.0, 58.0, 50.0, 12.0, 40.0, 12.0, 18.0, 50.0, 32.0, 40.0, 24.0, 20.0, 44.0, 64.0, 2.0, 12.0, 38.0, 50.0, 32.0, 2.0, 8.0, 40.0, 62.0, 52.0, 18.0.....................1. `

Comments

User 940 | 10/10/2015, 8:59:40 PM

Hi @abby,

Each tree can have more leaves than classes, some decision paths can lead to the same class. So the extracted features are, for each tree in the forest, the index of the leaf into which the instance fell.

You can get programmatic access to the tree nodes via model['trees_json'].

Let us know if you have any more questions!

Cheers! -Piotr


User 2356 | 10/30/2015, 1:11:06 PM

j=model['trees_json']

gives no result then issuing this command takes forever: type(j) has to interrupt kernel

Also open('tree.json','w').write(j) does not write anything to the file

@piotr


User 2356 | 10/30/2015, 1:13:27 PM

Also @piotr how can we do backward analysis and find the path which caused the tree to output that particular label?