problems using random forest classifier

User 4234 | 4/3/2016, 12:48:20 PM

Hi guys,

I am learning Graphlab now. When I try to use random forest to do classification task, I meet the following error, ''' [ERROR] graphlab.toolkits.main, 66: Toolkit error: Option 'maxiterations' not recognized. ''' Actually I was just copying the codes from https://dato.com/learn/userguide/supervised-learning/randomforestclassifier.html

Comments

User 91 | 4/4/2016, 9:41:54 PM

I think that is a typo in the documentation. The parameter is called num_trees for random forrest. Sorry for the bother. We will fix that ASAP.


User 4234 | 4/5/2016, 5:17:28 AM

Thank you for clarifying this.

Then it makes sense for the random forest.


User 4234 | 4/6/2016, 7:51:56 AM

I changed the parameter name to num_trees and rerun the code.

It worked smoothly and no bugs. But, it seems that the meaning is actually "max_iterations".

I set the num_trees to be 50, and it stopped after 50 iterations, like this: | 49 | 2693.222271 | 0.291442 | 2.870682 | 0.277149 | 2.995834 | | 50 | 2749.105812 | 0.291800 | 2.870653 | 0.277274 | 2.996720 | +-----------+--------------+-------------------+-------------------+--------------------- And when I print the model, it says that the number of trees is 1950, like this

` Settings


Number of trees : 1950 Max tree depth : 15 Training time (sec) : 2749.154 Training accuracy : 0.2918 `


User 940 | 4/6/2016, 10:09:12 PM

Hi @DataGeek ,

You're right, the parameter is a bit mis-leading. Basically each iteration builds (number of classes -1) trees, where each tree is a one-vs-all classifier. So you end up with num_trees * (number of classes -1) trees total.

It's algorithmically not an iteration though, since each iteration is not dependent on the previous one. They are independent sets of trees.

I hope this makes sense.

Cheers! -Piotr