Boolean SArrays

User 495 | 7/22/2014, 6:43:40 PM

Are there boolean arrays in graphlab or are these all ints?

When we do something like s > 20 this appears to return a SArray of type int:

<pre class="CodeBlock"><code>In [11]: s = gl.SArray([1, 2, 3])

In [12]: s > 1 Out[12]: dtype: int Rows: 3 [0, 1, 1]</code></pre>

Note we can index with an int SArray: s[s] which seems strange (to me)!


User 18 | 7/22/2014, 9:08:58 PM

There is no boolean type for SArrays. Non-zero integer values are interpreted as True, zero is False. It's somewhat standard C/C++ behavior, which we adopted here.

User 495 | 7/22/2014, 9:27:22 PM

Thanks @alicez‌, I was surprised that you don't use bit arrays (or similar) for performance.

User 91 | 7/22/2014, 9:29:06 PM

Our compression of integer columns have a similar effect.

User 495 | 7/22/2014, 9:38:20 PM

@srikris‌ Ah, thanks. Are you pooling (like a pandas categorical) all columns? If so, does this mean that things like addition/mult are done on the labels (i.e. you don't have to touch the actual data) ?

User 91 | 7/23/2014, 5:35:11 AM

Currently, the SFrame does not have native categorical support. Our toolkits convert string columns to categorical representations using the technique that you mention.