Regarding the topic counts output in doc_dir in GraphLab Topic Modelling Toolkit

User 32 | 4/22/2014, 7:36:08 AM

I ran the ./cgs_lda command of Graphlab Topic Modelling Toolkit on a corpus of around 50,000 documents. But the results of topic counts in docs are saved only for around 6000 documents.

I ran the following command for this

./cgslda --corpus ./cardealersdataunibitri/doc-word-count.tsv --dictionary ./cardealersdataunibitri/dictionary.txt --topk 20 --ntopics 100 --alpha 0.1 --beta 0.01 --worddir ./wordcounts/wordcounts --docdir ./doccounts/doc_counts --burnin=10800 --ncpus=4 --interval 500 | tee results.txt

Also I specified the number of cpu cores to be used as '4'. But only 2 files "doccounts.1of2" and "doccounts.2of2" were saved in the doc_counts directory.

Also the file "doccounts.1of_2 " does not contain all the topic values for the last index of the file saved into it.

Also is there a quick way to plot the Likelihood value on every iteration of cgs_lda algorithm ?

Thank you very much for your time and effort.


