Question about graphlab (groupby)

User 1766 | 8/12/2015, 10:08:35 PM

I have SFrame with columns:

[movieid, directorid, rating, raterid, ratername]

I want to get:

for each pair in (movieid, directorid), get pair of (raterid, ratername) where this rater gave maximum rating.

My best guess right now is: <code> result = df.groupby(['movieid', 'directorid'], {'maxrating': gl.aggregate.MAX('rating'), 'raterid': gl.aggregate.SELECTONE('raterid'), 'ratername': gl.aggregate.SELECTONE('ratername')}) </code> My problem is that I am not sure that SELECTONE picks row that corresponds to the MAX('rating')

What is correct way to do it?

Comments

User 1592 | 8/13/2015, 9:03:38 AM

Hi Try to use ARGMAX operator who will give you the rater_id who given max value for this movie.

See the documentation here: https://dato.com/products/create/docs/graphlab.data_structures.aggregation.html


User 1766 | 8/18/2015, 8:59:51 PM

Thank you. Your advice moved me from 200 to the 25th place at the https://www.kaggle.com/c/icdm-2015-drawbridge-cross-device-connections/leaderboard :)


User 1592 | 8/19/2015, 3:43:32 AM

Great to hear this! We would love to publish a notebook from you in our gallery once the competition is over.

Best,