Saving SFrame is very slow

User 512 | 11/4/2014, 7:23:54 PM

I applied a TextBlob function to create a new column in SFrame, but when I tried to save the SFrame, it became very slow. My codes are below:

from textblob import TextBlob from textblob.sentiments import NaiveBayesAnalyzer

sf['sentimentpos']=sf['text'].apply(lambda x: TextBlob(x,analyzer=NaiveBayesAnalyzer()).sentiment.ppos) sf.save('/home/sframe_sentiment')

P.S. More information on TextBlob: http://textblob.readthedocs.org/en/dev/advanced_usage.html#sentiment-analyzers

Any possible way to make it faster?

Comments

User 14 | 11/4/2014, 7:42:06 PM

Hi Shuning,

apply() is lazy and the function you applied seems expensive. The SFrame doesn't get materialized until save is called, which makes it seem like "save" is slow.

Thanks, Jay


User 15 | 11/4/2014, 7:44:44 PM

SFrames are lazy, so that function is applied when save is called, and not when apply is called. I'm not familiar with the textblob library, but I assume what you're doing there is a time consuming operation. You can time a call to sf.materialize() to see how much time that apply operation is actually taking before calling save. I'm not really sure how to make the text analysis step faster, but at least you can have time for the two separate operations.


User 512 | 11/4/2014, 10:10:31 PM

Thanks both for your comments! That is good to know.