How to get the Nth value / how to sort?

User 386 | 6/24/2014, 11:44:43 AM

I have a SFrame with a unique user ID column and a count column (how many time a user did something). what I want to get is an SFrame with the users with count values of 10% < count < 90% of all the counted events. but how?

Comments

User 14 | 6/24/2014, 3:55:02 PM

You can use sf['count'].sketchsummary() to obtain the approximate 10% and 90% quantiles, say, they are stored in variables c10, and c90. Then you can use logical filter to filter sframe to obtain the users whose counts fall between c10 and c90: selectedusers_counts = sf[sf['count'] < c90 && sf['count'] > c10]


User 386 | 6/25/2014, 12:03:28 PM

thanks! btw, is there any possibility to: 1) sort (an SArray for example, of SF by column etc)? 2) use the logical filter for a more "String-ish" conditions sauch as sf[sf['somecol'].startswith(someStr)]] and alike?


User 14 | 6/25/2014, 4:33:42 PM

1) sort is on the way. You will see it in the incoming release. 2) string-ish filter can be achieved using lambda functions:

sf[sf['some_col'].apply(lambda x: x.startsWith('blah'))]

For more SFrame/SArray functionality, please look at our API docs. http://graphlab.com/products/create/docs/graphlab.data_structures.html#classes