User 1262 | 8/18/2015, 11:05:24 PM
In your “Practical Deep Text Learning” notebook you are using a genism word2vec model inside an “apply()” to query the model for vector representations of each word:
dt = DeepTextAnalyzer(model) sf['vectors'] = sf['posts'].apply(lambda p: dt.txt2avg_vector(p, is_html=True))
When I try to do the same my memory fills very fast and I get an exception. (I’m guessing the model footprint in memory is big and it is creating a copy of the model every time an apply function is called)
I have 24G of RAM and 4 processors.
If I instead query the model using a for loop then I don’t have any problems:
vectors = [ querymodel(x) for x in sf[‘posts’]]
But then I’m not parallelizing this transformation as I could with apply…
How much RAM and processors you had when you ran your code?
Is there any way I could query the model from inside an apply without filling my memory? (somehow had a read only model in memory that could be shared among threads?)