switching positive class for target variable leads to LR breaking

User 2785 | 5/9/2016, 8:46:25 PM

When I switch the order of the target variable in the logistic regression model (i.e. but changing the leading character from 0 to 1) only one version of the model works.

This is the output I'm getting for the model that works where the positive target variable is "1churned" and the base target variable is "0didnotchurn":

` Logistic regression:


Number of examples : 13614441 Number of classes : 2 Number of feature columns : 29 Number of unpacked features : 29 Number of coefficients : 31 Starting Accelerated Gradient (FISTA)


+-----------+----------+-----------+--------------+-------------------+---------------------+ | Iteration | Passes | Step size | Elapsed Time | Training-accuracy | Validation-accuracy | +-----------+----------+-----------+--------------+-------------------+---------------------+ Tuning step size. First iteration could take longer than subsequent iterations. | 1 | 2 | 0.000000 | 226.640028 | 0.706054 | 0.706589 | | 2 | 3 | 0.000000 | 245.845978 | 0.739336 | 0.739671 | | 3 | 4 | 0.000000 | 259.994745 | 0.738422 | 0.738631 | | 4 | 5 | 0.000000 | 273.725786 | 0.742593 | 0.742774 | | 5 | 6 | 0.000000 | 287.740096 | 0.744435 | 0.744600 | | 6 | 7 | 0.000000 | 301.537535 | 0.746510 | 0.746683 | | 7 | 8 | 0.000000 | 315.445846 | 0.748166 | 0.748445 | | 8 | 9 | 0.000000 | 329.520746 | 0.750321 | 0.750701 | | 9 | 10 | 0.000000 | 343.645110 | 0.752307 | 0.752685 | | 10 | 11 | 0.000000 | 357.489586 | 0.753919 | 0.754288 | +-----------+----------+-----------+--------------+-------------------+---------------------+ TERMINATED: Iteration limit reached. This model may not be optimal. To improve it, consider increasing max_iterations

And this is the output of the model that doesn't work where the positive target variable is "1didnotchurn" and the base target variable is "0churned

` Logistic regression:


Number of examples : 13611691 Number of classes : 2 Number of feature columns : 29 Number of unpacked features : 29 Number of coefficients : 31 Starting Accelerated Gradient (FISTA)


+-----------+----------+-----------+--------------+-------------------+---------------------+ | Iteration | Passes | Step size | Elapsed Time | Training-accuracy | Validation-accuracy | +-----------+----------+-----------+--------------+-------------------+---------------------+ Tuning step size. First iteration could take longer than subsequent iterations. | 1 | 2 | 0.000000 | 229.870333 | 0.705982 | 0.706819 | | 2 | 3 | 0.000000 | 249.583010 | 0.739331 | 0.739811 | | 3 | 4 | 0.000000 | 263.597828 | 0.738423 | 0.738631 | | 4 | 5 | 0.000000 | 277.695706 | 0.742586 | 0.742994 | | 5 | 6 | 0.000000 | 291.514579 | 0.744282 | 0.744872 | +-----------+----------+-----------+--------------+-------------------+---------------------+ TERMINATED: Terminated due to numerical difficulties. This model may not be ideal. To improve it, consider doing one of the following: (a) Increasing the regularization. (b) Standardizing the input data. (c) Removing highly correlated features.
(d) Removing inf and NaN values in the training data. ` Any thoughts as to why this is happing?

Comments

User 12 | 5/10/2016, 5:43:46 PM

It looks like you are using a randomly selected validation set during training, which causes the actual training set to be different in each case (note the different number of examples in the model summary). To eliminate that as a potential cause, can you try re-running each model with the validation_set parameter set to None.

Thanks, Brian