Can't run regression with array of feature?

User 1933 | 6/23/2015, 2:08:06 PM

Hey gang - I have a data structure ("x_dyadic") that contains a "features" column. I need to do some transformation of that column, and then run a regression. Here's the head of the original data

+--------+--------+-------------------------------+
| urlid  | userid |            features           |
+--------+--------+-------------------------------+
| 128538 | 219460 | [0.0, 11.0, 0.0, 1.0, 2.0,... |
| 128538 | 321171 | [1.0, 11.0, 0.0, 1.0, 151.... |
| 146680 | 351410 | [2.0, 13.0, 0.0, 1.0, 151.... |
| 15247  | 367309 | [3.0, 7.0, 0.0, 1.0, 151.0... |
| 138381 | 396025 | [4.0, 12.0, 0.0, 31.0, 149... |
| 128538 | 461671 | [5.0, 11.0, 0.0, 10.0, 169... |
| 138381 | 468329 | [6.0, 12.0, 0.0, 10.0, 237... |
| 128538 | 528840 | [7.0, 11.0, 0.0, 1.0, 149.... |
| 128538 | 744918 | [8.0, 11.0, 0.0, 10.0, 210... |
|  2702  | 798909 | [9.0, 0.0, 0.0, 32.0, 149.... |
+--------+--------+-------------------------------+
...

I prep the data like this (this references some other data structures, but all works fine):

def prep_regression(row):
    i = int(row[0])
    j = int(row[1])
    rating = row[2]
    o_ij = result['E_alpha'][i]+result['E_beta'][j]+np.inner(result['E_S'][i],result['E_Z'][j])
    y = rating-o_ij
    x = row[2:]
    return {'x':x,'y':y}
reg_data = gl.SFrame(x_dyadic['features'].apply(prep_regression)).unpack('X1')

This gives me a a "reg_data" frame that looks like this (which is exactly what I wanted/expected):

+-------------------------------+---------------+
|              X1.x             |      X1.y     |
+-------------------------------+---------------+
| [0.0, 1.0, 2.0, 1318.0, 5.... | 3637.65969144 |
| [0.0, 1.0, 151.0, 4192.0, ... | 3595.97746545 |
| [0.0, 1.0, 151.0, 3218.0, ... | 3060.36574852 |
| [0.0, 1.0, 151.0, 1850.0, ... | 2488.97943337 |
| [0.0, 31.0, 149.0, 796.0, ... | 4200.46169802 |
| [0.0, 10.0, 169.0, 4140.0,... |  3654.3224253 |
| [0.0, 10.0, 237.0, 105.0, ... | 6600.06079326 |
| [0.0, 1.0, 149.0, 2949.0, ... | 3665.87638767 |
| [0.0, 10.0, 210.0, 166.0, ... | 3676.42529979 |
| [0.0, 32.0, 149.0, 369.0, ... | 1453.50305146 |
+-------------------------------+---------------+
...

Now that all seems great, but when I try to run my regression

reg = gl.regression.create(reg_data,target='X1.y',features='X1.x')

I receive the error

TypeError: Input 'features' must be a list.

Now, why the features have to be a list and can't be an array, I have no idea, but that would seem to be the problem <code>type(regdata['X1.x'][0])</code> gives me <code>array.array</code>. But what can I do here? I tried explicitly specifying x as a list in my prepregression function (i.e. <code>x = row[2:].tolist()</code>) but that doesn't seem to help...any ideas?

Comments

User 1933 | 6/23/2015, 2:25:54 PM

Oops, I just the problem has nothing to do with my data, and just how I specified the model. The features argument itself is what needs to be a list. That is, I need to use <code>reg = gl.regression.create(regdata,target='X1.y',features=['X1.x'])</code> instead of <code>reg = gl.regression.create(regdata,target='X1.y',features='X1.x')</code>.