Topic Modeling

User 209 | 4/7/2014, 5:16:19 PM

Hello!

I am using the topic modeling implementation you have provided in the toolkits package. My problem is that the output does not seem to converge. I am using the input of daily_kos you have provided. I have searched the forum for similar discussions and found that another person had a problem with convergence and you suggested to use the synchronous engine. I tried that but again it does not seem to converge. It might be the case that I am not running it sufficiently enough? So, my question is how long did you run it on your tests and figured it converged?

Currently, I have made the input smaller (by using only 10 tokens for every document, and ~1500 docs in total) and am running on 5 machines for 30 mins and it's not converging. This is how my smaller graph looks like: Number of words: 3386 Number of docs: 1442 Number of tokens: 39061

Comments

User 6 | 4/9/2014, 12:29:03 PM

Hi Vicky, ' Not sure what is the problem. I am running the example and printing only the likelihood:

./cgslda --corpus ./dailykos/tokens --dictionary ./daily_kos/dictionary.txt 2>/dev/null | grep Likelihood Likelihood: -7.33417e-09 Likelihood: -4.26201e+06 Likelihood: -4.1695e+06 Likelihood: -4.14181e+06 Likelihood: -4.13094e+06 Likelihood: -4.12625e+06 Likelihood: -4.11868e+06

The log likelihood goes up as it should. Please clarify what is wrong...