Graphlab SFrame vs. PyTable

User 690 | 1/14/2015, 4:54:46 AM

Hi Everybody, I seem to be reasonably familiar with SFrame. But however, I was just looking at other python options that are out there with respect to out-of-memory computation. I would love it if somebody can , at a high level, compare graphlab's SFrame with PyTables.. If somebody thinks I am comparing apples to oranges, a comment in the same spirit is welcome. Thanks, Sunil.


User 1189 | 1/14/2015, 6:48:59 PM

There are similarities, but somewhat different in architectural objective.

I am not too familiar with PyTables but as far as I can tell it is optimized for math and query capabilities. (API also seems somewhat low level).

SFrames was built to support more ML-ish / data-science-ish needs (feature engineering, extraction, removal, etc), and is heavily optimized for fast columnar manipulation, allowing new columns to be added, deleted, and composed in different ways cheaply. The SFrame itself does not require a schema to be predefined, and has both strong schema types (long, double, etc) and weak schema types (list, dictionary) and has many builtin capabilities to manipulate those types.

On the other hand, what we lack which PyTables support, is that the SFrame is immutable in that values cannot be modified, and has no indexing capabilities, and does not support hierachical schemas.

But also, the SFrame's immutability allows HDFS to be used as a backend, allowing for larger tables, and potentially distributed in the future. (we are working on it).