Indexing SFrames with logical "or" (|) condition

User 1933 | 2/17/2016, 3:27:11 AM

I am not getting the expected behavior when doing some conditional indexing of an SFrame. Here's a toy example showing the problem. Given

x = gl.SFrame({'rating':[None,1],'imp_rating':[0.05,1]})
print x

+------------+--------+
| imp_rating | rating |
+------------+--------+
|    0.05    |  None  |
|    1.0     |   1    |
+------------+--------+
[2 rows x 2 columns]

I would think I could get rows where imp_rating is < 0.1 OR rating is 1 like this:

print x[(x['imp_rating']<0.1) | (x['rating']==1)]

+------------+--------+
| imp_rating | rating |
+------------+--------+
|    1.0     |   1    |
+------------+--------+
[? rows x 2 columns]

But clearly it doesn't work. Bizarrely, this does:

print x[(x['imp_rating']<0.1) | (x['rating']!=None)]

+------------+--------+
| imp_rating | rating |
+------------+--------+
|    0.05    |  None  |
|    1.0     |   1    |
+------------+--------+
[? rows x 2 columns]

What am I doing wrong here?

Comments

User 12 | 2/17/2016, 10:43:29 PM

Hi @jlorince, that is indeed a bug - thank you for catching it and sending it in.

For what it's worth, just doing x['rating'] == 1 should return [0, 1], but it comes back with [None, 1]. I'll file the bug report in the SFrame repo.

Thanks, Brian


User 1933 | 2/18/2016, 3:39:41 AM

Ok, glad to help. Definitely a strange bug. Can you confirm that my third example should work in general? I'm using on a large SFrame and it seems to be working, but I want to be confident it's not doing something funky. Thanks!


User 5213 | 5/19/2016, 8:17:03 PM

Clearly, this bug is still there. I tried to index the SFrame based on the following logical condition involving AND : sales[sales["sqft_living"] > 2000 and sales["sqftliving"] < 4000]. It doesn't seem to work as the resulting SFrame still contained observations with values of 'sqftliving' feature less than 2000. My guess is that in this case only the second part of x AND y statement is taken into consideration, namely y. As a work around, I created a boolean vector separately and then indexed SFrame with this vector.


User 1207 | 5/19/2016, 11:47:23 PM

Hey @PMe,

That behavior is not a bug, it's a strange artifact of python operators, but it's enough of a usability issue that in the next version SFrame and GLC will raise an exception if you try to use "and" or "or" with an SArray.

What is happening is that the "and" in python casts the left side to a boolean, and if it's a container type, e.g. list or tuple or sarray, then this cast to boolean is True if there are elements in the array and False otherwise. If this first element is True, then it returns the right hand side, and if it evaluates to False, it returns the left-hand side. You can see how this behavior in this example:

` In [1]: [False, False] and [False, False] Out[1]: [False, False]

In [2]: [False, False] and [] Out[2]: []

In [3]: [] and [False, False] Out[3]: []

In [4]: [False, False] and [False] Out[4]: [False] `

However, the correct thing to do in your case is to use the bitwise & operator -- your statement should read:

(sales[sales["sqft_living"] > 2000) & (sales["sqft_living"] < 4000])

In the next version, running your original statement will raise an error.

Hope that helps! -- Hoyt