Impossible to set the solver in gl.logistic_classifier.create

User 1067 | 12/15/2014, 5:05:51 PM

Hi,

I tried to change the solver of a logistic model to 'newton' which seems more sensible for my dataset. But the solver stay to 'lbfgs' or 'fista' (if I set a L1 penalty).

Comments

User 91 | 12/15/2014, 6:19:20 PM

Hi Hanna,

Is there a particular issue with the solver chosen. We try and chose the solver for best performance based on the characteristics of your data. Could you tell us a bit about your data?

The Newton method is currently only implemented for L2 regularization or no regularization. We haven't yet gotten around to implementing Newton method for L1 penalty models.

Secondly, we chose LBFGS when your problem has many coefficients (>500) because the Newton method's iteration complexity is cubic in the number of coefficients. In comparison, LBFGS is linear in the number of coefficients.


User 1067 | 12/17/2014, 6:23:23 PM

Hi srikis,

My dataset contains only categorical features with high cardinality. By the way, when a categorical variable with high cardinality (several thousand in my dataset) is dummified is their a limit number for the matrix size (like dummify the 49 most frequent category and group smaller categories in one "other" category ).

Initially I tried to "tweak" the solver because LBFGS didn't converge on my dataset. Fista did and gives sensible prediction.

thanks,


User 91 | 12/17/2014, 6:54:08 PM

Hanna,

It is unfortunate (and strange) that LBFGS didn't converge on your dataset. Can I see the log produced by the solver? Was it a numerical error? I am glad that Fista converged and gave a sensible prediction.

Currently, we do not handle the categorical variables with the "junk" category. It is definitely something we are considering in the future. We will get back to you on that. Right now, you can do that with SFrame operations using group_by. Here is how I would do it: (Let us assume that I have an SFrame with a feature 'X1')

Start by grouping the features by count

sf_grps = sf.groupby('X1', gl.aggregate.COUNT('X1'))

Retain the top 49 categories

goodcategories = sfgrps.topk('Count', k = 49)['X1']

For all those that are not 'good categories', you can filter them out by calling them 'junk'

sf['X1processed'] = sf['X1'].apply(lambda x: x if x in goodcategories else 'junk')

You don't have to worry about doing this during predict time. The model stores the categories that it saw during training time, so it will automatically treat any unseen categories as junk categories which would then result in sensible predictions.


User 1067 | 12/18/2014, 10:10:50 AM

Hi Srikris,

Thanks for your answer and advice (Will be very useful). The output of the logistic regression with LBFGS is: [10:46:38] [INFO] [dku.utils] - PROGRESS: Logistic regression: [10:46:38] [INFO] [dku.utils] - -------------------------------------------------------- [10:46:38] [INFO] [dku.utils] - Number of examples : 40428967 [10:46:38] [INFO] [dku.utils] - Number of feature columns : 23 [10:46:38] [INFO] [dku.utils] - Number of unpacked features : 23 [10:46:38] [INFO] [dku.utils] - PROGRESS: Number of coefficients : 40462297 [10:46:38] [INFO] [dku.utils] - PROGRESS: Starting L-BFGS [10:46:38] [INFO] [dku.utils] - -------------------------------------------------------- [10:46:52] [INFO] [dku.utils] - PROGRESS: Iter Grad-Norm Loss Step size Elapsed time [10:46:52] [INFO] [dku.utils] - PROGRESS: 0 1.335e+07 2.802e+07 1.000e-06 14.07s [10:48:27] [INFO] [dku.utils] - PROGRESS: 1 5.129e+06 1.992e+07 1.000e+00 108.89s [10:49:49] [INFO] [dku.utils] - PROGRESS: 2 4.256e+06 1.064e+07 3.000e+00 191.04s [10:50:10] [INFO] [dku.utils] - PROGRESS: 3 3.299e+06 7.741e+06 3.000e+00 212.41s [10:50:32] [INFO] [dku.utils] - PROGRESS: 4 2.510e+06 4.208e+06 3.000e+00 233.67s [10:50:53] [INFO] [dku.utils] - PROGRESS: 5 1.263e+06 2.939e+06 3.000e+00 255.29s [10:51:15] [INFO] [dku.utils] - PROGRESS: 6 8.709e+05 1.873e+06 3.000e+00 276.70s [10:51:37] [INFO] [dku.utils] - PROGRESS: 7 8.394e+04 1.268e+06 3.000e+00 298.98s [10:51:59] [INFO] [dku.utils] - PROGRESS: 8 1.759e+05 4.495e+05 3.000e+00 321.18s [10:52:22] [INFO] [dku.utils] - PROGRESS: 9 2.187e+05 1.252e+05 3.000e+00 343.95s [10:52:44] [INFO] [dku.utils] - PROGRESS: 10 2.903e+04 4.145e+04 3.000e+00 366.10s

As you can see, learning step doesn't diminish with the number of iteration. Do you think it might you just need more iteration?


User 91 | 12/18/2014, 4:29:29 PM

Hanna,

It looks like the algorithm is converging. For LBFGS, the step size doesn't need to reduce with the number of iterations. For the most part, it should remain constant (or fluctuate by very small amounts). It looks like the loss function is reducing, so the algorithm is definitely converging. You could run it for a bit more and measure the accuracy of predictions to see if you get a better model.