SFrame.fillna() didn't seem to work

User 2216 | 9/4/2015, 6:30:32 PM

I have a SFrame named with 1000+ features. When I tried applying fillna() on some of them that contain missing values it didn't seem to work:

In [75]: small['VAR_0074'] Out[75]: dtype: int Rows: 10000 [None, None, None, 1582L, None, None, None, None, None, 936L, None, 5500L, None, 3500L, 1600 L, None, 4117L, None, None, None, None, 3000L, None, None, None, 2500L, None, None, None, 25 99L, None, None, 3750L, None, 2530L, 3000L, None, 4875L, 2200L, None, 3250L, None, None, Non e, None, None, None, None, None, None, None, None, 1963L, None, None, None, 1600L, 5000L, No ne, None, None, None, 5000L, None, None, None, None, None, None, None, None, 0L, 2823L, None , 3000L, None, None, None, 3241L, 1896L, 1517L, None, None, None, None, None, None, 2200L, N one, None, None, None, 2400L, None, 5000L, None, None, None, None, 1248L, ... ]

In [76]: small.fillna('VAR0074',0) Out[76]: .... [10000 rows x 1934 columns] Note: Only the head of the SFrame is printed. You can use printrows(numrows=m, numcolumns=n) to print more rows and columns.

In [77]: small['VAR_0074'] Out[77]: dtype: int Rows: 10000 [None, None, None, 1582L, None, None, None, None, None, 936L, None, 5500L, None, 3500L, 1600 L, None, 4117L, None, None, None, None, 3000L, None, None, None, 2500L, None, None, None, 25 99L, None, None, 3750L, None, 2530L, 3000L, None, 4875L, 2200L, None, 3250L, None, None, Non e, None, None, None, None, None, None, None, None, 1963L, None, None, None, 1600L, 5000L, No ne, None, None, None, 5000L, None, None, None, None, None, None, None, None, 0L, 2823L, None , 3000L, None, None, None, 3241L, 1896L, 1517L, None, None, None, None, None, None, 2200L, N one, None, None, None, 2400L, None, 5000L, None, None, None, None, 1248L, ... ]

My expectation is it would show something like: [0,0,0,1582L,0.0...... ]

Is there anything I'm missing?

Comments

User 2032 | 9/7/2015, 11:19:31 AM

fillna returns an sframe - is not in place:)

try this usefull snippet to fillna for a whole sframe

for c in sf.column_names():
	sf = sf.fillna(c, 0)

User 2216 | 9/8/2015, 6:51:13 PM

This is exactly what I did. Specifying column name in fillna is merely to facilitate the discussion..

Speaking of which, I had to explicitly assign the "zero" value depending on variable data type though the document says fillan will automaticatly convert data types. That's a bit frastrating.

Anyway, my main issue is not being able to fill null values using fillna function. Any clues?


User 1190 | 9/9/2015, 6:27:37 AM

Hi Chris_CC,

In your code, small.fillna('VAR_0074',0) does not modify small but returns a new sframe, which did not get assigned to any variable.

When you print small, it is the unmodified sframe.

If you do small = small['VAR_0074']; print small['VAR_0074'], you should get expected result.

If not, please post a complete code example with synthetic data.

Best Regards, -jay


User 2216 | 9/9/2015, 5:55:04 PM

I see. You meant the transformation didn't happen inplace, right? I'll give it another try and see if it works.

Thanks,


User 2216 | 9/10/2015, 9:39:16 PM

Just an update that it does work. Thanks a lot!