GraphLab Topic Modeling does not converge

User 37 | 3/25/2014, 4:01:06 PM

Hi, I'm running the Topic Modeling toolkit (cgs_lda), but I find it does not converge. And what I don't understand is that even though I use only one machine, the number of total tokens reported takes a long time to grow from 0 to the actual number of tokens.

Am I doing it correctly?

My command: mpirun -machinefile localhost.mpi ./cgslda --corpus /nfs/topicmodeling_data --ncpus 64 --ntopics 1000 --alpha 0.1 --beta 0.1

Output: Finalizing graph. Finished in 40.6714 seconds. Computing number of words and documents. Number of words: 101636 Number of docs: 299752 Number of tokens: 99541054 Total tokens: 99541054 INFO: omniengine.hpp(omniengine:194): Using the Synchronous engine. INFO: distributedgraph.hpp(finalize:702): Distributed graph: enter finalize INFO: distributedingressbase.hpp(finalize:185): Finalizing Graph... INFO: distributedingressbase.hpp(finalize:230): Skipping Graph Finalization because no changes happened... INFO: distributedgraph.hpp(finalize:702): Distributed graph: enter finalize INFO: distributedingressbase.hpp(finalize:185): Finalizing Graph... Running The Collapsed Gibbs Sampler INFO: distributedingressbase.hpp(finalize:230): Skipping Graph Finalization because no changes happened... INFO: asyncconsistentengine.hpp(start:1212): Spawning 10000 threads Total Tokens: 0 Likelihood: -3.41828e-05 INFO: asyncconsistentengine.hpp(start:1238): Total Allocated Bytes: 8467221680 INFO: distributedaggregator.hpp(decrementdistributedcounter:787): Distributed Aggregation of likelihood. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of likelihood Likelihood: -8.09259e+07 INFO: distributedaggregator.hpp(decrementfinalizecounter:840): 0Reschedule of likelihood at 10.3 INFO: distributedaggregator.hpp(decrementdistributedcounter:787): Distributed Aggregation of globalcounts. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of globalcounts Total Tokens: 5615257 INFO: distributedaggregator.hpp(decrementfinalizecounter:840): 0Reschedule of globalcounts at 10.7 INFO: distributedaggregator.hpp(decrementdistributedcounter:787): Distributed Aggregation of likelihood. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of likelihood Likelihood: -1.61504e+08 INFO: distributedaggregator.hpp(decrementfinalizecounter:840): 0Reschedule of likelihood at 15.6 INFO: distributedaggregator.hpp(decrementdistributedcounter:787): Distributed Aggregation of globalcounts. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of globalcounts Total Tokens: 11815556 INFO: distributedaggregator.hpp(decrementfinalizecounter:840): 0Reschedule of globalcounts at 16.2 INFO: distributedaggregator.hpp(decrementdistributedcounter:787): Distributed Aggregation of likelihood. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of likelihood Likelihood: -2.42148e+08 INFO: distributedaggregator.hpp(decrementfinalizecounter:840): 0Reschedule of likelihood at 20.9 INFO: distributedaggregator.hpp(decrementdistributedcounter:787): Distributed Aggregation of globalcounts. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of globalcounts Total Tokens: 18041636 INFO: distributedaggregator.hpp(decrementfinalizecounter:840): 0Reschedule of globalcounts at 21.6 INFO: distributedaggregator.hpp(decrementdistributedcounter:787): Distributed Aggregation of likelihood. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of likelihood Likelihood: -3.22336e+08 INFO: distributedaggregator.hpp(decrementfinalizecounter:840): 0Reschedule of likelihood at 26.2 INFO: Html�I�M! ��7# ++����FYI: If you are using Anaconda and having problems with NumPyHello everyone,

I ran into an issue a few days ago and found out something that may be affecting many GraphLab users who use it with Anaconda on Windows. NumPy was unable to load, and consequently everything that requires it (Matplotlib etc).

It turns out that the current NumPy build (1.10.4) for Windows is problematic (more info here).

Possible workarounds are downgrading to build 1.10.1 or forcing an upgrade to 1.11.0 if your dependencies allow. Downgrading was easy for me using conda install numpy=1.10.1

Thanks for your attention!

RafaelMarkdown558,824,8414L���4L���179.110.206.156179.110.206.1564P�}��Xj�8\j�1str�"��\j�Xj��\j�8bj�րi�1(׀i��g��b�j����Xj�\j�Xj�8\j�1.hpp(decrementdistributedcounter:787): Distributed Aggregation of likelihood. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of likelihood Likelihood: -3.22336e+08 INFO: distributedaggregator.3HLABDISABLELAMBDA_SHM"] = "1" os.environ["GRAPHLABFORCEIPCTOTCP_FALLBACK"] = "1" import graphlab as gl

3. Test out your lambda worker code in this environment. If it works, then you can make the above configuration permanent by running:

gl.sys_util.write_config_file_value("GRAPHL

Comments

User 20 | 3/25/2014, 5:44:55 PM

Hi,

  • Each machine has 64 CPUs? And how many machines are you running on?
  • You can also try --engine=synchronous and see how that behaves.
  • For large problems, cutting down on the likelihood printing interval may make things go faster (default is every 5 seconds). you can increase it to say... 60s with --interval=60 --lik_interval=60

User 37 | 3/26/2014, 5:18:33 PM

Hi, I was running with one machine, and this machine has 64 CPUs.

I just ran it again using "synchronous" engine, and this time, I'm using 8 machines.

I find it converges, but I don't understand why the reported "Total Tokens:" is not a constant. I think the program is counting the total tokens by mapreducing the edges. So why it's not a constant:

INFO: synchronousengine.hpp(start:1314): 0: Starting iteration: 25 INFO: synchronousengine.hpp(start:1363): Active vertices: 101636 INFO: synchronousengine.hpp(start:1412): Running Aggregators Likelihood: -1.11976e+09 Total Tokens: 99540806 INFO: synchronousengine.hpp(start:1314): 0: Starting iteration: 26 INFO: synchronousengine.hpp(start:1363): Active vertices: 299752 INFO: synchronousengine.hpp(start:1412): Running Aggregators Likelihood: -1.11713e+09 Total Tokens: 99540800 INFO: synchronousengine.hpp(start:1314): 0: Starting iteration: 27 INFO: synchronousengine.hpp(start:1363): Active vertices: 101636 INFO: synchronousengine.hpp(start:1412): Running Aggregators Likelihood: -1.11171e+09 Total Tokens: 99540797 INFO: synchronousengine.hpp(start:1314): 0: Starting iteration: 28 INFO: synchronousengine.hpp(start:1363): Active vertices: 299752 INFO: synchronousengine.hpp(start:1412): Running Aggregators Likelihood: -1.10954e+09 Total Tokens: 99540804 INFO: synchronousengine.hpp(start:1314): 0: Starting iteration: 29 INFO: synchronousengine.hpp(start:1363): Active vertices: 101636 INFO: synchronous_engine.hpp(start:1412): Running Aggregators Likelihood: -1.10476e+09 Total Tokens: 99540803

Thanks, Cui


User 20 | 3/26/2014, 5:42:16 PM

Eh... it should be constant-ish. Looks like a minor integer rounding error.

    GLOBAL_TOPIC_COUNT[t] = std::max(count_type(total[t]/2), count_type(0));
    sum += GLOBAL_TOPIC_COUNT[t];

The total[t]/2 will lose a few counts occasionally when total[t] is odd.