About Topic Modeling toolkit

User 37 | 3/4/2014, 3:12:57 PM

Hi, I'm going to try the Topic Modeling toolkit (topicmodeling/cgslda.cpp)

I find it's implemented in an awkward way. I'm wondering why not place the sampling in the edge (does GraphLab have edge function as opposed to vertex function?), and have the edge update the topic assignment of vertices?

Thanks, Cui

Comments

User 20 | 3/4/2014, 5:51:04 PM

The sampling is effectively performed on the edge. There is no "edge function", but the vertex function does have a "gather-phase" which essentially performs a parallel reduction over each edge (thats the gather function). The result of the gather is then stored on the vertex during the apply.


User 37 | 3/4/2014, 6:42:10 PM

Thanks for the explanation.

Here's my understanding:

(Correct me if I'm wrong) The sampling logic is implemented in the scatter() function of vertices. Both the doc vertices and word vertices do the sampling and update the vertex data of itself and also the neighbor. So I think for each token, both the doc vertex and word vertex have to do the sampling, and they can possibly have different results.

So why not just having one side to do the sampling, like having only the doc vertices to that?

Thanks, Cui