Can we have SFrame.unpack(..., fill_na=0)

User 1129 | 1/8/2015, 8:42:11 AM

In the following code I try to create an SFrame from a list of dicts. Since not all the dicts contain every column, I would like to convert missing values to 0

<code class='CodeBlock> ee = [{'a': '1a', 'b': '1b'}, {'a': '2a', 'c': '2c'}] tbl = gl.SFrame(ee).unpack('X1') print(tbl) </code>

Here's the output:

<code class="CodeBlock"> +------+------+------+ | X1.a | X1.b | X1.c | +------+------+------+ | 1a | 1b | None | | 2a | None | 2c | +------+------+------+ [2 rows x 3 columns] </code>

Currently the only way I see to do this is:

<code class="CodeBlock"> columnnames = tbl.columnnames() for c in column_names: tbl = tbl.fillna(c, 0) print(tbl) </code>

which is very much suboptimal in large data sets: if the data contains N columns, it is copied N times. Moreover, I could have saved some copying cycles if I could do something similar to the following pandas code

<code class="CodeBlock"> for c in columnames: if not tbl[c].isnull().any() continue else: #do stuff </code>

Comments

User 954 | 1/8/2015, 7:26:42 PM

The code you suggested is in fact efficient: <pre class='CodeBlock'><code> columnnames = tbl.columnnames() for c in column_names: tbl = tbl.fillna(c, 0) print(tbl) </code></pre> Note that SFrame is a collection of SArrays. So the following code is not copying the entire SFrame, but only copies the SArray associated to column 'c'. The rest of the columns are not copied. <pre class='CodeBlock'><code> tbl = tbl.fillna(c, 0) </code></pre>

I hope it helps.