User 2568 | 11/10/2015, 1:43:52 AM
I'm working on the Kaggle Titanic data set. I need to impute a value for the passenger Age. A simple solution would be to use the mean, however I wanted to use the predicted value from a model that was created from the existing data.
I've created the mode using data that has a valid age like this; agemodel = gl.linearregression.create( full_data[fulldata['Age']!=None]['Fare','Pclass','Age', 'Ticket', 'Cabin', 'Embarked', 'Sex', 'FamilySize', 'Title'], target='Age', maxiterations = 1000, convergence_threshold=0.01)
and I can create the predictions on the rows that don't have an valid age like this: agemodel.predict(fulldata[full_data['Age']==None])
But I'm not sure how to create an Age column, replacing missing items with a prediction.