Another tripple_apply pecularity

User 1129 | 1/29/2016, 8:49:57 PM

I've found another interesting edge case which I wasn't able to solve (yet). Please let me know if you understand what's going on. You may find the notebook with full data as a GitHub gist. Here's the short discription:

We have an edge definition table that looks like this: __src_id,__dst_id,edge_type,weight "C_56","C_36","C_C",1 "C_53","B_20","C_B",1 "B_12","A_02","B_A",1 "B_00","A_00","B_A",1 "C_02","B_04","C_B",1 "C_80","B_31","C_B",1 "C_80","C_28","C_C",1 "B_34","A_06","B_A",1 "B_33","A_06","B_A",1 "B_35","A_06","B_A",1 "C_02","B_05","C_B",1 There are 150 edges. Each edge has a type, according to the first letter of its two vertices: CC, CB, B_A, etc. To make sure that the types are valid, we may use the following function:

def validate_edge_types(gr): '''Print True or False, depending on edge type validity Also, return a binary array of valid edge types ''' sel = gl.SArray(["%s_%s" % (r['__src_id'][0], r['__dst_id'][0]) == r['edge_type'] for r in gr.edges]) print(np.all(sel)) return sel I also define a super simple function to be used by triple_apply: def do_stuff(src, edge, dst): edge['weight'] /= 1.0 return src, edge, dst

When I load the data, the resulting SGraph is valid. However, look what happens when I use tripple_apply to create another graph:

` gg = gl.SGraph(edges=gl.loadsframe('testedges.csv')) validateedgetypes(gg); # prints "True"

ggafter = gg.tripleapply(dostuff, ['weight']) sel = validateedgetypes(ggafter) print(ggafter.edges[1-sel]) The last statement prints "False". It also prints the edges in which the edge types have changed. In my case: +----------+----------+-----------+--------+ | __srcid | __dstid | edgetype | weight | +----------+----------+-----------+--------+ | C73 | B24 | CC | 1 | | C12 | C41 | CB | 1 | | C05 | B14 | CC | 1 | | C82 | C16 | CB | 1 | | C23 | C36 | CB | 1 | | C22 | B20 | CC | 1 | | B32 | A06 | CB | 1 | | C57 | B54 | BA | 1 | | C95 | C16 | CB | 1 | | C41 | B47 | CC | 1 | +----------+----------+-----------+--------+ `

What am I missing here? Why desn't tripple_apply keep edge order?

Comments

User 1129 | 1/29/2016, 8:59:43 PM

I've found another similar discussion: (here), but it was never answered.


User 16 | 1/31/2016, 9:59:45 PM

Hi Bgbg -

Sorry to hear you're having issues with triple apply. I'm not sure I completely understand the problem.

Is the issue that triple apply is changing the ordering of edges?

I tried running your code but 'np' is not definded in the third cell. Is that numpy?

Thanks, Toby


User 1190 | 2/2/2016, 1:11:02 AM

Thanks for reporting the issue. I added it to the github issue tracking: https://github.com/dato-code/SFrame/issues/157


User 1129 | 2/2/2016, 9:25:09 AM

Toby: the problem is that the order of several edge fields changes. In other words, edge information isn't correct anymore: if before tripplapply, an edge between node X and Y had an "edgetype" of A, and edge between Y and Z had the type of B, then after applying the function, the edge types may swap places.

np is numpy.


User 16 | 2/2/2016, 6:32:07 PM

Thanks, for the info. Looks like Jay was able to determine the issue and has already filed a issue.