Is there any reason classifier.create can accept categorical features as lists?

User 2568 | 1/9/2016, 12:54:39 AM

I'm working on the Kaggle Telstra Network challenge where I create the training data as follows:

et=event_type.groupby('id',{"event_type": gl.aggregate.CONCAT("event_type")})

id          |	event_type
-------|------------
15376  |	[event_type 42, event_type 44, event_ ...
2427    |	[event_type 11]

train = train_data.join(et, on='id', how='left')

The groupby is necessary to ensure the training data has only one row per 'id'. When use this in a classifier.create,

gl.boosted_trees_classifier.create(train_data, target='fault_severity',
                features=['location', 'event_type'], verbose=False, random_seed=87456)

I get the error [ERROR] Toolkit error: Feature 'event_type' is not of type (numeric, string, array, or dictionary).

Given that classifier.create can handle categorical features as arrays and dictionaries, is there any reason it should not also accept lists?

While I have a workaround for this, essentially create a dictionary with the value =1, it seems logical to me that classifier.create could accept lists. Just a thought.

Comments

User 2535 | 1/12/2016, 6:24:43 PM

Hi @Kevin_McIsaac,

Thanks for the note. I'll pass this feedback along to the team.

Best, Jon