User 2568 | 2/14/2016, 5:02:38 AM
I have a TRAINING data set with 7381 rows and a TESTING with 11171 row. They have 14 features, most of which are dicts.
I want to use Quadratics on 12 of these features. If I create the transformation in the usual way, this takes a second or so, ie.. from graphlab.toolkits.feature_engineering import *
chain = QuadraticFeatures(features=all_features) quadratic = gl.feature_engineering.create(new_train_data,chain) new_train_data = quadratic.transform(new_train_data) new_test_data = quadratic.transform(new_test_data)
However, the problem all the new features are in a single column which means I use all or none in my models. I thought I could chain the transformations pair-wise so each has its own column, which I can then choose to use or not use. I wrote: import itertools from graphlab.toolkits.featureengineering import * newtraindata, newtestdata = initialise.loaddata(reload_data=False)
chain = [QuadraticFeatures(features=pair, output_column_name=",".join(pair)) for pair in itertools.combinations(all_features, 2)]
Which works quickly to create 66 QuadraticFeatures. When I try to create the transform quadratic = gl.featureengineering.create(newtrain_data,chain)
My server runs at 100% and after 5 min I give up and restart the kernel