SFrame constructor can no longer take a pandas DataFrame?

User 515 | 10/27/2014, 8:03:42 PM

<b class="Bold">I get an error creating an SFrame from a pandas DataFrame, but have a workaround by writing and reading a csv file. Here's the workaround, the error follows. </b> <pre class="CodeBlock"><code>Xtrain.tocsv(join(dsdir, 'training.csv'), sep='\t', narep='null', dateformat="%Y-%m-%d", index=False) Xsffromcsv = gl.SFrame.readcsv(join(dsdir, 'training.csv'), delimiter='\t', navalues = 'null', errorbad_lines = False, header=True)</code></pre>

<b class="Bold">Is this really unsupported, or just a type inference error? </b> <code class="CodeInline">Xsf = gl.SFrame(Xtrain)</code>

TypeError Traceback (most recent call last) <ipython-input-21-3349cb6a178a> in <module>() ----> 1 Xsf = gl.SFrame(Xtrain)

/Users/jma/anaconda/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in init(self, data, format, proxy) 424 pass 425 else: --> 426 raise ValueError('Unknown input type: ' + format) 427 428 sframe_size = -1

/Users/jma/anaconda/lib/python2.7/site-packages/graphlab/cython/context.pyc in exit(self, exctype, excvalue, traceback) 21 def exit(self, exctype, excvalue, traceback): 22 if not self.showcythontrace and exctype: ---> 23 raise exctype(exc_value)

TypeError: Cannot infer Array type. Not all elements of array are the same type.


User 91 | 10/29/2014, 4:58:34 PM

SFrame's can be constructed from Pandas Data Frames. It could be a potential issue. Could you share your data so we can take a look?

User 91 | 10/30/2014, 5:55:32 PM

Just to clarify, the Pandas data frame and the SFrame need not have a one-to-one correspondence. Pandas can work with columns of heterogenous types, where as SFrames for various reasons do not support that right now.

For example, the following data frame cannot be converted to an SFrame

rawdata = [[['a'], '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] df = pd.DataFrame(rawdata)

sf = gl.SFrame(df)

TypeError: Cannot infer Array type. Not all elements of array are the same type.

However, when you do save to a csv,


you can then load it back as an SFrame where the column of heterogenous type is inferred as a string.

sf = gl.SFrame.read_csv('foo.csv')

Columns: X1 int 0 str 1 float 2 float

Rows: 3

User 5354 | 7/6/2016, 1:00:03 PM

Depending on your underlying data structures, a short cut here could be to convert your columns in the panda data frame into type strings before converting to an SFrame.

For example, where the pandas dataframe is df:, df['col1'] = df['col1''].apply(lambda x: str(x))

and then importing to SFrame via: sf = gl.SFrame(data = df)

SFrame seems to not support some of the panda dataframe column types, so forcing them into strings and then converting back post SFrame conversion may do the trick.