Strange behaviour of triple_apply

User 1129 | 1/26/2016, 3:03:57 PM

I observe a very strange behaviour with triple_apply.

I use the opensource SFrame, version 1.6

This is the graph structure I am working with: https://www.dropbox.com/s/yr7h1z8s2uo1mw9/my_graph.tar.gz?dl=0

I follow the pagerank example by defining two functions:

` def sum_weight(src, edge, dst): src['__total_weight'] += edge['weight'] return src, edge, dst

def normalize_weight(src, edge, dst): edge['weight'] /= src['__total_weight'] return src, edge, dst `

Then, apply the two functions on the graph, as follows:

gr = gl.load_sgraph('my_graph') gr = gl.load_sgraph('/Users/boris/temp/my_graph') gr.vertices['__total_weight'] = 0 gr = gr.triple_apply(sum_weight, ['__total_weight']) gr = gr.triple_apply(normalize_weight, ['weight'])

At this point, a ZeroDivisionError is raised. I have no idea why. Is this a bug or my mistake? Any ideas?

This is my notebook: https://gist.github.com/bgbg/2ede90c96fdd27e10743

Comments

User 1592 | 1/26/2016, 7:34:59 PM

Hi Boris Can you make sure all weights are non negative and non zero? Maybe the same of the weights is zero and then you divide by zero?


User 1129 | 1/27/2016, 7:59:24 AM

All the edges have positive weights:

(gr.edges['weight'] <= 0).sum() Output: 0

Also, I should note that this problem occurs with disconnected graphs. However, as far as I understand how triple_apply works, this shouldn't matter


User 1129 | 1/27/2016, 8:40:29 AM

I've found it (after MANY MANY hours): the problem is with self-links. In these cases, when the first function (sum_weight) visits a self-link, the dst and src represent the same node! Important:, these two are not the same object. Neither does src == dst evaluate to True. To detect a self-link, one needs to test if src['__id'] == dst['__id']


User 1129 | 1/27/2016, 8:42:56 AM

https://github.com/dato-code/how-to/pull/37