groupby in sFrame

User 357 | 6/24/2014, 12:51:49 AM

Hi dear Graphlabers,

I have a question about aggregating in an sFrame. I would like to do count by the levels of a categorical variable. For example, 'categoricalVar' has category levels 1, 2, and 3. And the goal is to get the count by each level.

sf.groupby("colName", {'count':gl.aggregate.COUNT('categoricalVar'==3)}) does not work. Any suggestions? (forgive my poor Python skills, I feel it is a trivial question....)

Thanks! Shirley


User 18 | 6/24/2014, 11:09:19 PM

Assuming the name of the column for 'categoricalVar' is 'colName', try this:

levelcounts = sf.groupby('colName', gl.aggregate.COUNT) levelcounts[levelcounts['colName'] == 3] # gives you the entire row levelcounts[level_counts['colName'] == 3]['Count'] # gives you an SArray containing one element which is the count

Or, if you are only interested in the count of a specific level, you can filter the sframe into a smaller sframe and then take the number of rows:

sf[sf['colName'] == 3].num_rows()

Or there may be easier/less verbose ways of doing this.

User 357 | 6/25/2014, 9:18:40 PM

Thanks Alice! :)