Possible bug in SFrame.unique()

User 1933 | 2/23/2016, 3:28:07 PM

I'm not positive if this is a bug or not, but it certainly seems a little weird.

Given an SFrame x (this is obviously a toy example):

In [19]: x = gl.SFrame({'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]})

In [20]: x
Out[20]:
Columns:
    a   int
    b   int
    c   int

Rows: 3

Data:
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 4 | 7 |
| 2 | 5 | 8 |
| 3 | 6 | 9 |
+---+---+---+
[3 rows x 3 columns]

Let's say we have some meaningful column order (this is important, e.g., if you want to write to file in particular format) that we specify like so:

In [21]: x = x[['c','a','b']]

In [22]: x
Out[22]:
Columns:
    c   int
    a   int
    b   int

Rows: 3

Data:
+---+---+---+
| c | a | b |
+---+---+---+
| 7 | 1 | 4 |
| 8 | 2 | 5 |
| 9 | 3 | 6 |
+---+---+---+
[3 rows x 3 columns]

Now, if we call .unique() on this SFrame, it turns out that the column order is not preserved:

In [23]: x.unique()
Out[23]:
Columns:
    a   int
    b   int
    c   int

Rows: 3

Data:
+---+---+---+
| a | b | c |
+---+---+---+
| 3 | 6 | 9 |
| 2 | 5 | 8 |
| 1 | 4 | 7 |
+---+---+---+
[3 rows x 3 columns]

Now, the docs say Remove duplicate rows of the SFrame. Will not necessarily preserve the order of the given SFrame in the new SFrame. But my reading of that (and familiarity with similar operations) would take that to mean that row order is not preserved, but that column order (which is essentially to say, the schema for the SFrame) should not be affected.

Of course this can we worked around when writing to file with something like x[['c','b','a']].save(...), but I wanted to report this in case it is a bug. At the very least, it would be nice if the documentation were clearer on the topic.

Comments

User 19 | 2/23/2016, 5:28:47 PM

Hi jlorince,

I agree that this is undesired. You're right that our docs were referring to row order not column order. I have created a bug report.

Thanks! Chris