How does GraphLab accumulate scores of all gradient boosting trees to final predictions

User 1843 | 4/29/2015, 6:00:34 PM

Hi,

Can someone share some insight on how GraphLab calculates the predictions for Gradient Boosting Trees Classifier in terms of probability? In one of the Gradient Boosting Trees Classifier models I created, I used 8 trees, and I tried one sample for prediction. Based on the JSON trees, the leaf node value that the sample fired are: -0.536623 -0.346335 -0.003302 -0.206162 0.114814 0.096973 -0.514708 -0.152272

And when I used the model to predict this single sample with probability output type, the result is 0.17543102667164928. Can someone kindly explain how that result is calculated based on the scores I listed above? Apparently summing or averaging those scores didn't give me that result.

BTW, I used class weight {0:1, 1:12}, step size 0.3, maxdepth 6, minchild_weight 0.1 in my training. Not sure if they are used in the predictions

Thanks

Comments

User 1190 | 5/4/2015, 6:07:49 PM

Hi @Bruce_Yang,

The leaf nodes stores weights, which is transformed into probability via logistic function. If you only construct one tree, and observed weight "w" at a particular leaf node, the probability should be 1 / (1 + exp(-w)).

Using boosted trees, the weights are combined and transformed to the probability: 1 / (1 + exp(-(w0 + w1 + w2 + ...)).

In your case let W = -0.536623 + (-0.346335) + ..., you should be able to get back prob=1/(1+exp(-W))=0.175


User 3066 | 1/19/2016, 2:39:32 AM

It's clear to me how the probability is calculated. However, this is the probability of belonging to which class? How does it decide, based on this probability, the class?