User 37 | 3/25/2014, 4:01:06 PM

Hi, I'm running the Topic Modeling toolkit (cgs_lda), but I find it does not converge. And what I don't understand is that even though I use only one machine, the number of total tokens reported takes a long time to grow from 0 to the actual number of tokens.

Am I doing it correctly?

My command:
mpirun -machinefile localhost.mpi ./cgs*lda --corpus /nfs/topic*modeling_data --ncpus 64 --ntopics 1000 --alpha 0.1 --beta 0.1

Output:
Finalizing graph. Finished in 40.6714 seconds.
Computing number of words and documents.
Number of words: 101636
Number of docs: 299752
Number of tokens: 99541054
Total tokens: 99541054
INFO: omni*engine.hpp(omni*engine:194): Using the Synchronous engine.
INFO: distributed*graph.hpp(finalize:702): Distributed graph: enter finalize
INFO: distributed*ingress*base.hpp(finalize:185): Finalizing Graph...
INFO: distributed*ingress*base.hpp(finalize:230): Skipping Graph Finalization because no changes happened...
INFO: distributed*graph.hpp(finalize:702): Distributed graph: enter finalize
INFO: distributed*ingress*base.hpp(finalize:185): Finalizing Graph...
Running The Collapsed Gibbs Sampler
INFO: distributed*ingress*base.hpp(finalize:230): Skipping Graph Finalization because no changes happened...
INFO: async*consistent*engine.hpp(start:1212): Spawning 10000 threads
Total Tokens: 0
Likelihood: -3.41828e-05
INFO: async*consistent*engine.hpp(start:1238): Total Allocated Bytes: 8467221680
INFO: distributed*aggregator.hpp(decrement*distributed*counter:787): Distributed Aggregation of likelihood. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of likelihood
Likelihood: -8.09259e+07
INFO: distributed*aggregator.hpp(decrement*finalize*counter:840): 0Reschedule of likelihood at 10.3
INFO: distributed*aggregator.hpp(decrement*distributed*counter:787): Distributed Aggregation of global*counts. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of global*counts
Total Tokens: 5615257
INFO: distributed*aggregator.hpp(decrement*finalize*counter:840): 0Reschedule of global*counts at 10.7
INFO: distributed*aggregator.hpp(decrement*distributed*counter:787): Distributed Aggregation of likelihood. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of likelihood
Likelihood: -1.61504e+08
INFO: distributed*aggregator.hpp(decrement*finalize*counter:840): 0Reschedule of likelihood at 15.6
INFO: distributed*aggregator.hpp(decrement*distributed*counter:787): Distributed Aggregation of global*counts. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of global*counts
Total Tokens: 11815556
INFO: distributed*aggregator.hpp(decrement*finalize*counter:840): 0Reschedule of global*counts at 16.2
INFO: distributed*aggregator.hpp(decrement*distributed*counter:787): Distributed Aggregation of likelihood. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of likelihood
Likelihood: -2.42148e+08
INFO: distributed*aggregator.hpp(decrement*finalize*counter:840): 0Reschedule of likelihood at 20.9
INFO: distributed*aggregator.hpp(decrement*distributed*counter:787): Distributed Aggregation of global*counts. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of global*counts
Total Tokens: 18041636
INFO: distributed*aggregator.hpp(decrement*finalize*counter:840): 0Reschedule of global*counts at 21.6
INFO: distributed*aggregator.hpp(decrement*distributed*counter:787): Distributed Aggregation of likelihood. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of likelihood
Likelihood: -3.22336e+08
INFO: distributed*aggregator.hpp(decrement*finalize*counter:840): 0Reschedule of likelihood at 26.2
INFO: Html`�I�M! ��7# ++����FYI: If you are using Anaconda and having problems with NumPyHello everyone,`

I ran into an issue a few days ago and found out something that may be affecting many GraphLab users who use it with Anaconda on Windows. NumPy was unable to load, and consequently everything that requires it (Matplotlib etc).

It turns out that the current NumPy build (1.10.4) for Windows is problematic (more info here).

Possible workarounds are downgrading to build 1.10.1 or forcing an upgrade to 1.11.0 if your dependencies allow. Downgrading was easy for me using
`conda install numpy=1.10.1`

Thanks for your attention!

RafaelMarkdown558,824,8414L���4L���179.110.206.156179.110.206.1564P�}��Xj�8\j�1str�"��\j�Xj��\j�8bj�րi�1(׀i��g��b�j����Xj�\j�Xj�8\j�1.hpp(decrement*distributed*counter:787): Distributed Aggregation of likelihood. 0 remaining.
INFO: distributed*aggregator.hpp(decrement*distributed*counter:793): Aggregate completion of likelihood
Likelihood: -3.22336e+08
INFO: distributed*aggregator.3HLAB*DISABLE*LAMBDA_SHM"] = "1"
os.environ["GRAPHLAB*FORCE*IPC*TO*TCP_FALLBACK"] = "1"
import graphlab as gl

**3.** Test out your lambda worker code in this environment. If it works, then you can make the above configuration permanent by running:

`gl.sys_util.write_config_file_value("GRAPHL`

User 20 | 3/25/2014, 5:44:55 PM

Hi,

- Each machine has 64 CPUs? And how many machines are you running on?
- You can also try --engine=synchronous and see how that behaves.
- For large problems, cutting down on the likelihood printing interval may make things go faster (default is every 5 seconds). you can increase it to say... 60s with --interval=60 --lik_interval=60

User 37 | 3/26/2014, 5:18:33 PM

Hi, I was running with one machine, and this machine has 64 CPUs.

I just ran it again using "synchronous" engine, and this time, I'm using 8 machines.

I find it converges, but I don't understand why the reported "Total Tokens:" is not a constant. I think the program is counting the total tokens by mapreducing the edges. So why it's not a constant:

INFO: synchronous*engine.hpp(start:1314): 0: Starting iteration: 25
INFO: synchronous*engine.hpp(start:1363): Active vertices: 101636
INFO: synchronous*engine.hpp(start:1412): Running Aggregators
Likelihood: -1.11976e+09
Total Tokens: 99540806
INFO: synchronous*engine.hpp(start:1314): 0: Starting iteration: 26
INFO: synchronous*engine.hpp(start:1363): Active vertices: 299752
INFO: synchronous*engine.hpp(start:1412): Running Aggregators
Likelihood: -1.11713e+09
Total Tokens: 99540800
INFO: synchronous*engine.hpp(start:1314): 0: Starting iteration: 27
INFO: synchronous*engine.hpp(start:1363): Active vertices: 101636
INFO: synchronous*engine.hpp(start:1412): Running Aggregators
Likelihood: -1.11171e+09
Total Tokens: 99540797
INFO: synchronous*engine.hpp(start:1314): 0: Starting iteration: 28
INFO: synchronous*engine.hpp(start:1363): Active vertices: 299752
INFO: synchronous*engine.hpp(start:1412): Running Aggregators
Likelihood: -1.10954e+09
Total Tokens: 99540804
INFO: synchronous*engine.hpp(start:1314): 0: Starting iteration: 29
INFO: synchronous*engine.hpp(start:1363): Active vertices: 101636
INFO: synchronous_engine.hpp(start:1412): Running Aggregators
Likelihood: -1.10476e+09
Total Tokens: 99540803

Thanks, Cui

User 20 | 3/26/2014, 5:42:16 PM

Eh... it should be constant-ish. Looks like a minor integer rounding error.

```
GLOBAL_TOPIC_COUNT[t] = std::max(count_type(total[t]/2), count_type(0));
sum += GLOBAL_TOPIC_COUNT[t];
```

The total[t]/2 will lose a few counts occasionally when total[t] is odd.