Cross-tabulation on SFrame

User 3252 | 3/10/2016, 2:31:11 AM


I was wondering how we can display a cross-tabulation on SFrame as we do using the table function in R. Example R code:

data (mtcars) table(mtcars$gear, mtcars$cyl) 4 6 8 3 1 2 12 4 8 4 0 5 2 1 2 '


User 15 | 3/10/2016, 11:04:40 PM

We don't explicitly have this feature. However, you can achieve the same with this: <pre>

Assuming mtcars is an SFrame now...

xtab = mtcars.groupby(['gear'], {'freqcount':graphlab.aggregate.FREQCOUNT('cyl')}) xtab = xtab.unpack('freqcount', columnname_prefix='').sort('gear') </pre>

The sort is just to make the output match don't necessarily need it.

User 3252 | 3/12/2016, 1:44:27 AM

Hi Evan,

I loaded mtcars into an SFrame and tried the first command and got error. Any suggestions?

xtab = mtcars.groupby(['gear'], {'freq_count':graphlab.aggregate.FREQ_COUNT('cyl')}) AttributeError: 'module' object has no attribute 'FREQ_COUNT'

User 2568 | 3/12/2016, 3:14:07 AM

Ram I think FREQ_COUNT is new to version 1.8.4, which was release this week. Suggest you upgrade using these instructions . It will only take 30 seconds or so.

Alternatively use this pivot function I wrote some time ago using the syntax

pivot(mtcars, 'gear', 'cyl')

def pivot(data, row, col, item, agg=gl.aggregate.COUNT("id")):
    tab= data.groupby([col, row],
            {"xyzzy":gl.aggregate.CONCAT(col, item)}).unpack('xyzzy')

    for col in tab.column_names():

    col_names = tab.column_names()
    col_dict = dict((col, col.replace("xyzzy.", '')) for col in col_names)


    return tab