k-nearest-neighbor

User 1768 | 5/8/2015, 6:33:55 PM

Hi. I have a question regarding the example of k-nearest-neighbor in :

https://dato.com/products/create/docs/graphlab.toolkits.nearestneighbors.html?ga=1.119998216.1842117571.1426618748#id1

In the result of algorithm, you can see that the distance of 0 to 2 is 0.3059, but the distance of 2 to 0 is 0.17029. Why they are different? They need to be similar?

+-------------+-----------------+----------------+------+ | querylabel | referencelabel | distance | rank | +-------------+-----------------+----------------+------+ | 0 | 2 | 0.305941170816 | 1 | | 0 | 1 | 0.771556867638 | 2 | | 1 | 1 | 0.390128184063 | 1 | | 1 | 0 | 0.464004310325 | 2 | | 2 | 0 | 0.170293863659 | 1 | | 2 | 1 | 0.464004310325 | 2 | +-------------+-----------------+----------------+------+

Comments

User 91 | 5/8/2015, 6:40:44 PM

If you look more closely at the example, the querylabels and referencelabels do not refer to the same entity i.e querylabel = 0 is not the same as referencelabel = 0 i.e those are 2 different points. If you do run NN with the query and reference data as the same, then you should observe commutative distance values.


User 1768 | 5/8/2015, 7:28:26 PM

Hi @srikris thanks a lot for your fast answer. Sorry, I didn't get your point. May be I don't know what is 'querylabel' and 'referencelabel'. I think that they are the ID of users in the graph. At least in my case that I select the label='__id', they are the ID of users. If, you look at the example in: https://dato.com/learn/userguide/nearestneighbors/nearestneighbors.html

They show that we can even test the result in this way: I did the same instruction in my Graph. After running:

knn.sort('distance', ascending=False).print_rows(num_rows=100,num_columns=4) +-------------+-----------------+---------------+------+ | querylabel | referencelabel | distance | rank | +-------------+-----------------+---------------+------+ | 42829 | 43192 | 653685.084616 | 5 | | 42829 | 43805 | 567408.015033 | 4 | | 42829 | 42652 | 520336.422389 | 3 | | 42829 | 468393 | 369786.47142 | 2 |

I run this as they explain in that URL,

sf_check = sf_degree[['__degree', '__triangle']]
In [140]: print "distance check 1:", graphlab.distances.euclidean(sf_check[42829], sf_check[43192])

and I got this answer that is different from the first row in the result:

distance check 1: 927.142923179

Although this is another problem and I would like to know why the distance is not the same as the result? But, still I didn't understand the first issue in first comment, too. I will be appreciated, if you guide me again.


User 1768 | 5/11/2015, 8:42:14 PM

Hi @Brian. Thanks so much for your complete response. Yes, I got your point. But, in my case both reference and query are the same. Imagine that I have one Sframe 'sf_degree' with three columns: '__id', 'degree' and 'Triangle', and I create the knn with these commands:

model = graphlab.nearest_neighbors.create(sf_degree, label='__id', features=['degree', 'triangle_count’], distance='euclidean')

knn = model.query(sf_degree,label='__id', k=5)

Is it correct? What I need to do to make it correct? Because, there is contradictory. and I wonder why the distance of points are not as same as when I check it? Please guide me, if you understand my problem. Thanks in advance again.


User 1768 | 5/26/2015, 9:56:13 AM

Hi @brian. Thanks for your nice attention. Yes, Still I have this problem. But, I encounter with another problem in my Mac that "my start-up Disk is full" and I am trying to use amazon AWS EC2. When I run again all, I will inform you about some contradictory here. Thanks again


User 1768 | 5/31/2015, 1:58:45 PM

Hi @Brian. Again , I run the Knn algorithm. and I encountered the same problem that I explained in the previous comments. Do you think my model creation or Query is wrong or there is another problem? I don't understand what is the problem. I will send you my code by email. I will be appreciated if you guide me. Thanks in advance


User 1768 | 6/10/2015, 12:54:55 PM

Hi @Brian, Thanks a lot for your nice attention and complete response in my email. I tried your solution and it is correct. Yes, you are right. The problem was for the label. Thanks again.