Cross Validation with Logistic Regression

User 5293 | 6/24/2016, 9:54:17 AM


I have data formed in the following way

and I am trying to do logistic regression and cross validation :

  1. The Logistic regression model is like this :

    data['sum_area'] = data['avg_below'] + data['avgabove'] data = graphlab.crossvalidation.shuffle(data)

    traindata,testdata = data.randomsplit(.8,seed=2) traindata = graphlab.crossvalidation.shuffle(traindata) testdata = graphlab.crossvalidation.shuffle(test_data)

    features_1=['Pres1','Pres2','BPM' ,'avgbelow', 'avgabove','sum_area']

    model1 = graphlab.logisticclassifier.create(traindata, target='output', features=features1)

  2. When trying to do cross validation using the model I do the following :

    cros1 = graphlab.toolkits.crossvalidation.crossvalscore((traindata,testdata),graphlab.logistic_classifier,dict([('target','output'),('features',features_1)]))

when I show the results I get

I know something is wrong in the cross validation but I don't know what exactly.

I would Appreciate any HELP in how to do it properly


User 940 | 6/24/2016, 8:31:58 PM

Hi @amrfarid140 ,

Since you're currently evaluating one fold, the easiest thing to do would be:


Hope this helps.

Cheers! -Piotr

User 5293 | 6/24/2016, 11:55:55 PM

Hi @piotr

Thanks for your reply.I tried to used Kfold function as What I want to do is to evaluate the model across possible combination of train and test data.

I tried to follow this :

folds = graphlab.cross_validation.KFold(data, 5)
params = dict([('target', 'output'), ('features', ['Pres_1','Pres_2','BPM' ,'avg_below', 'avg_above','sum_area'])])
cros_1 = graphlab.toolkits.cross_validation.cross_val_score(folds

print cros_1.get_results()

but also got the same result. Not sure what am I getting wrong now ?