Only counting non null values in a aggregation

User 2568 | 7/16/2016, 2:52:00 AM

I've a devices table and a events table which I'm left joining by a deviceid. I wanted to groupby the deviceid and COUNT the number of non null events. Since this is a left join, if there isn't an event, there is a row (which I what I need) with event_id = None, but COUNT returns 1 instead of 0.

What is there a way to join these tables COUNT only the non null values, e.g., I'd like to write

sf = gender_age_train.join(events, how='left')
sf = groupby('device_id', {'count':gl.aggregate.COUNT('event_id', exclude=None)})

and have 'count' be zero for the deviceids where there isn't a matching row in events and so 'eventid' (which is in events) is None.

Thsi kind of counting is useful in feature engineering


User 2568 | 7/16/2016, 2:54:06 AM

Ah.... just realiased I can do it using

device_event_counts = train.groupby('device_id', {'events':gl.aggregate.CONCAT('event_id')})
device_event_counts['count'] = device_event_counts['events'].apply(len)

Not as elegant, but it works