User 1933 | 11/17/2015, 5:21:32 PM
Hey gang - I'm running some topic models in Graphlab on Google cloud compute, using a node with 32 cores and 120GB ram. What tips can you offer for maximizing performance?
So far I've set:
gl.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 32) gl.set_runtime_config('GRAPHLAB_DEFAULT_NUM_GRAPH_LAMBDA_WORKERS', 32) gl.set_runtime_config('GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY',100000000000) gl.set_runtime_config('GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY_PER_FILE',100000000000)
But I still only see ~15GB of RAM usage while the model is running. Anything else I can do to speed the models up? My corpus has ~150k documents, vocab size of ~112k, and ~4 billion tokens.
Of course, the biggest change I can do is change the number of model iterations, but I don't know what the best practices there are. I've done one test model, and I don't see huge changes in perplexity on a hold-out set as I change the number of iterations (I've tried 10,20,30,40, and 50). Generally speaking, am I safe to use the default of 10 iterations?