Any faster way to index a SArray?

User 4234 | 3/31/2016, 12:55:35 PM

For example, I have a list [a,b,c], and a SArray, the elements of which are [a,a,b,b,c,c,a].

I want to create another SArray, by replacing the elements of original by their index in the list.

For the above example, the desired final result is [0,0,1,1,2,2,0]

One way I came up with is [List.index(s) for s in SArray], but it turns out to be slow.

Any faster or more convenient way?

Comments

User 1178 | 3/31/2016, 9:26:40 PM

Hi,

You will want to avoid iterate through SArray and it goes from Python to C++ layer each time a row is processed, which is very inefficient. The proper way to do here is to use SArray.apply method, which process all rows in parallel and in batches:

# convert your list to a dictionary to be faster for index retrival
value_to_index_dict = {'a':0, 'b':1, 'c':2}
# do SArray.appply() to process rows in parallel and in batches
sa.apply(lambda x:  value_to_index_dict.get(x))

Hope this helps.


User 4234 | 4/1/2016, 1:28:21 AM

It really works!