Local Outlier Factor in Graphlab Create

User 1768 | 4/17/2015, 11:30:39 AM

I will be so appreciated if you guide me, is there any way to calculate the Local Outlier Factor (LOF) in Graphlab-Create? Thanks for your guidance in advance


User 1768 | 4/19/2015, 9:16:53 AM

Thanks so much Brian for your answer. I will try it.

User 1768 | 5/11/2015, 12:26:54 PM

Hi @brian , thanks for your suggestion. I tried the k-nearest-neighbor in Graphlab create. This is the result of it. But I have a problem,there are some nodes with distance equal 0. Is there any option to find the k-distinct-disstance nearest neighbor with graph lab? My mean is to find some nearest neighbor that their distance with the query_label is >0. not equal to 0.

Out[47]: querylabel referencelabel distance rank 3072567 2619359 0 1 3072567 572892 0 2 3072567 2262104 0 3 3072567 1624379 0 4 3072567 643797 0 5 3072525 3072525 0 1 3072525 3003724 0 2 3072525 428251 0 3 3072525 2713612 0 4 3072525 1903509 0 5 [10 rows x 4 columns]

I will be appreciated if you guide me.

User 1768 | 5/11/2015, 8:30:42 PM

Hi @brian. Thanks for your answer. The problem of this query is, assume that we found 5 nearest neighbor for each node, and all these 5 distances are 0. By, running this query all these rows will be deleted from Sframe. But, My point is there are at least k-distinct nearest neighbor for each node that the distance is larger than 0, not exactly different nodes with the same value. I would like to find these nearest neighbor during the model creation or query. Is there this ability in K-nearest-neighbor of graph lab create? I will be appreciated if you guide me.

User 1768 | 5/12/2015, 9:07:26 AM

Hi @Brian, really thanks so much for your nice guidance. You know the 'radius' you suggest that I found it here: https://dato.com/products/create/docs/generated/graphlab.datamatching.nearestneighbor_deduplication.create.html , is something different. it consider something similar to circle around each node. In other word, It is the Maximum distance from each node to a potential duplicate that will be start from 0 to what we assign to it, for example 3. What exactly I want, is the Redius>0 that I don't think, there is this option. If you think, there is any way to do this, please inform me. Thanks again for all the time you put for me to answer my questions.

User 1768 | 5/13/2015, 11:11:49 AM

Hi @Brian and thanks so much for your attention and response. This solution is removing all nodes with distance=0 similar to your previous suggestion, and in both cases, we will lost a lot of nodes from the original graph: 3072441 nodes in graph, to 564162 node. Finding k-distinct-nearest neighbor need to be done during the model creation to find some distinct neighbors with distance larger than 0. Because we can assuming that, for each node in the graph, there are at least k-distinct nearest neighbor. But, after the model creation and query process that we find k-nearest neighbor already, all we do, is removing all "query" nodes with distance=0 with their neighbors. And finally this will not be k-distinct-nearest-neighbors. Thanks a lot for your guidance and kindness.