Boosted trees CPU utilization

User 3069 | 2/2/2016, 9:18:29 AM

I wonder, why during decision trees work CPU load is only about 400% when it is 1600% available (16 cores). Is there some way to tune this up? May be some other runtime parameter like 'GRAPHLABDEFAULTNUMPYLAMBDAWORKERS' and 'GRAPHLABDEFAULTNUMGRAPHLAMBDA_WORKERS' dedicated for that?

Are there any other advises to speed up this model training time besides reducing maxiterations and maxdepth?


User 3069 | 2/2/2016, 7:26:29 PM

I'm not sure if measured CPU utilization is real, however questions about how to speed up process are appreciated

User 91 | 2/2/2016, 9:15:33 PM

Other than reducing those maxiterations and maxdepth, no other obvious things come to mind. The latest version of graphlab-create improves performance by around 2x.

We love making things go fast! What is your expectation of the speed for the boosted trees?

We would appreciate more information about what your dataset looks like (schema, number of columns, number of numeric columns, cardinality of categorical columns if any).

It would also be good to know what the configuration of your machine is. Do you have enough memory? If you don't, then we spill-to-disk which will make things a bit slower.