Cross-tabulation on SFrame

User 3252 | 3/10/2016, 2:31:11 AM

Hi,

I was wondering how we can display a cross-tabulation on SFrame as we do using the table function in R. Example R code:

data (mtcars) table(mtcars$gear, mtcars$cyl) 4 6 8 3 1 2 12 4 8 4 0 5 2 1 2 '

Comments

User 15 | 3/10/2016, 11:04:40 PM

We don't explicitly have this feature. However, you can achieve the same with this: <pre>

Assuming mtcars is an SFrame now...

xtab = mtcars.groupby(['gear'], {'freqcount':graphlab.aggregate.FREQCOUNT('cyl')}) xtab = xtab.unpack('freqcount', columnname_prefix='').sort('gear') </pre>

The sort is just to make the output match exactly..you don't necessarily need it.


User 3252 | 3/12/2016, 1:44:27 AM

Hi Evan,

I loaded mtcars into an SFrame and tried the first command and got error. Any suggestions?

xtab = mtcars.groupby(['gear'], {'freq_count':graphlab.aggregate.FREQ_COUNT('cyl')}) AttributeError: 'module' object has no attribute 'FREQ_COUNT'


User 2568 | 3/12/2016, 3:14:07 AM

Ram I think FREQ_COUNT is new to version 1.8.4, which was release this week. Suggest you upgrade using these instructions . It will only take 30 seconds or so.

Alternatively use this pivot function I wrote some time ago using the syntax

pivot(mtcars, 'gear', 'cyl')


def pivot(data, row, col, item, agg=gl.aggregate.COUNT("id")):
    tab= data.groupby([col, row],
            {item:agg}).groupby([row],
            {"xyzzy":gl.aggregate.CONCAT(col, item)}).unpack('xyzzy')

    for col in tab.column_names():
        tab[col]=tab[col].fillna(0)

    col_names = tab.column_names()
    col_names.remove(row)
    col_dict = dict((col, col.replace("xyzzy.", '')) for col in col_names)

    tab.rename(col_dict)

    return tab