using apply makes an SFrame slow.

User 3802 | 3/20/2016, 5:04:58 AM

I have an SFrame named frame with one column entry named "search_term". I create a new sfreame robust using the code robust['robust']=(frame['search_term'][0:100]).apply(lambda x: ''.join(list(correct(x))))

where correct is a spellcheck function I have defined before. The head of robust looks something like this

robust angle bracket bracket deckover rain shower head shower only faucet convection otr microwave over stove microwave emergency light mdf 3/4

where the function correct is some function previously defined. I then have the next problem when I try to use the sframe robust, everything is very slow. For example if I access the first entry with the code

import time t0=time.time() robust['robust'][0] t1=time.time() t1-t0

'angle bracket' 56.56002116203308

It takes 56 seconds! that is a lot of time. If I run code above two times i.e.

import time t0=time.time() robust['robust'][0] t1=time.time() t1-t0

'angle bracket' 0.005273103713989258

we get a much faster result. So I do not know what is going on on the first time I perform the operation. The main problem is that I am only taking the first 100 terms of the sframe. When I use the 65000 rows that I really want, it takes forever for the code to run. I do know what is the problem. Even when I use the identity function

def identity(x): return x

and I apply it. It gives me the same problem. If I apply lambda x: x, everything seems to be ok.

Comments

User 1190 | 3/21/2016, 6:32:54 PM

Hi,

This is caused by lazy-evaluation. The 'robust' column is lazily calculated every time you ask for it. You can force the materialization by calling robust.__materialize__(), After that, future use of the sframe will be much faster.