How To Join

User 2078 | 7/9/2015, 8:08:49 PM

Hi Guys,

Could someone please explain to me how I could perform the sql join below using Graphlab's syntax (supposing the tables are SFrames), thanks in advance:

select a.Column1, count(distinct(case when abs(a.column2 - b.column2) <= 1 then b.column3 end)) as total_bs from SFrame1 a left join SFrame2 b on a.ID = b.ID where a.C1 > b.C1 group by a.Column1

Cheers Benji


User 18 | 7/14/2015, 4:21:42 PM

Hi Benji,

You can probably do all of it using SFrame's groupby and join, as well as SArray's logical filters and <a href="">unique</a> operations.

Can you describe in words the key steps you need to perform? I can help translate into SFrame ops.


User 1178 | 7/14/2015, 6:15:40 PM

Hi Benji,

Here is a first attempt to solve your problem, I used pseudo code here to simplify the explanation. You may replace with actual column names:

  • Left join two SFrames on 'ID' columns

` joined = sf1.join(sf2, on='ID', how='left')

rename columns. You may want to rename a few columns since we rename the columns that are the same in both SFrames to something like <col-name>.1, <col-name>.2

joined.rename(...) `

  • filter out the rows you do not want

joined['tmp_c1_c1'] = joined.apply(lambda x: x['C1'] > x['C1.1']) joined = joined[joined['tmp_c1_c1']]

  • compute the temporary column for comparing column2:

` joined['tmp_c2'] = joined.apply(lambda x: x['column3.1'] if x['column2'] - x['column2.1'] <=1 else None)

use logical filter to filter out what you do not want

joined = joined[joined['tmp_c2']] `

  • get distinct of tmp_c2

` groupednondistinct = joined.groupby('Column1', {'totalbs':gl.aggregrate.CONCAT('tmpc2')})

construct final SFrame

final = gl.SFrame({ 'totalbs' : groupednon_distinct.apply(lambda x: len(set(x['totalbs']))), 'Column1': groupednon_distinct['Column1'] })


Note -- I did not test the code above and it is meant to give you an idea of how to break a sql query into SFrame operations and what are the operations you may leverage.

Thanks Ping