Separate files for train and test data in logistic classifier

User 4360 | 4/3/2016, 6:17:19 PM


I am working on a ad click logistic regression problem. I am provided with train data containing the click column that needs to be predicted and test data which doesn't have the click column. Instead of splitting the train data and creating a test set, I want to supply the provided test data I have. How would I go about doing that?

Thank you, Sergiu


User 1774 | 4/4/2016, 10:07:03 AM

If the test data doesn't have the click column, it can't be used for training. Split your train data to train set and validation set, and eventually apply your model to the test data.

An outline of your solution will look a bit like this (don't take this line-for-line): `python import graphlab as gl traindata = gl.SFrame.readcsv('traindata.csv') # or however the file is called train, validation = traindata.random_split(0.8)

model = gl.linear_regression.create(train, target='target') model.evaluate(validation) # evaluate how you're doing

testdata = gl.SFrame.readcsv('testdata.csv') testdata['click'] = model.predict(testdata)'testdatawith_predictions.csv') `