Wrong types in SGraph edges

User 1129 | 2/8/2015, 2:22:58 PM

Consider the following CSV file: <code class="CodeInline"> userid,itemid user1,item1 user1,item3 user2,item1 user1,item2 user2,item3 user3,item3 </code>

Now, I will read it into an SFrame and create a graph: <code> tbl=gl.SFrame.readcsv('temp.csv') grph = gl.SGraph().addedges(tbl, srcfield='userid', dstfield='itemid') </code>

One would expect that __scr_id and __dst_id fields in the edges table would be "string". At least this is what it looks like. However, the following code raises an exception:

<code> grph.edges.filter_by(['user1', 'user2'], '__src_id') </code>

The exception is:

<code> ----> 1 grph.edges.filter_by(['user1', 'user2'], '__src_id')

/usr/local/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in filterby(self, values, columnname, exclude) 3417 if giventype != existingtype: 3418 raise TypeError("Type of given values does not match type of column '" + -> 3419 columnname + "' in SFrame.") 3420 3421 with cython_context():

TypeError: Type of given values does not match type of column '__src_id' in SFrame. </code>

After some debugging, I realize that for some reason, column type of __src_id is treated as int and not str, which is completely wrong.

Comments

User 954 | 2/9/2015, 7:05:44 PM

Hi bgbg,

We noticed this bug in glc 1.2 and it is fixed in the next release (1.3). As a workaround please use <b class="Bold">get_edges()</b> instead of <b class="Bold">edges</b>.

<pre><code> grph.getedges().filterby(['user1', 'user2'], '__src_id') </code></pre>

Thanks for helping us to fix all those bugs.