User 674 | 6/9/2015, 6:07:43 PM
Dear all, It is very amazing to me that GraphLab Create can handle 5 million very very easily.
I came up with a version of k nearest neighbor in PowerGraph, [SOURCE].
The high level idea: Load S (the given set of points) and R ( query set ) from disk into graph. Connect each node r in R to every node in S.
Invoke the setDistance engine on all R points. Gather: for each edge, gather dist and vid to current vertex. Apply: do a quicksort to both dist and vid at the same time. Scatter: empty.
So at the end of the day, each node in R will have the K nearest neighbors and respective distances.
However, there are two problems with this code: 1. This code cannot use a C++ pair to group dist and vid together. If I use pair, compile error will show up. 2. This code cannot handle a larger number of points. The largest number of points I can handle at the moment is 2000. If I try with more points, segmentation fault will show up.
What I need help with I want figure out if I am loading points into PowerGraph correctly or not. I would truly appreciate some pointer on how to add C++ pair support to PowerGraph. The most desirable, I would love to understand what I might have done wrong to handle only 2000 points instead of millions of points.
Any input/pointer is truly appreciated, please do not be shy.
Thank you very much, Qiyuan Qiu