SFrame.apply() does not work on a 'dict' type column

User 2419 | 10/15/2015, 9:26:37 PM

Hi,

I have used the function defined below (awesome_count) on a dictionary type array series in the products SFrame for the 2nd assignment. Below is the code I used

> def awesome_count(dicto):
    >     if 'awesome' in dicto:
    >         return dicto.count('awesome')
    >     else:
    >         return 0
    >
    >
    > # test function awesome_count on a single cell
    > awesome_count(products[5]['word_count'])
    >  
># use function awesome_count on SArray in SFrame products  
    > products['awesome'] = products['word_count'].apply(awesome_count)

The output of the code is as follows:


RuntimeError Traceback (most recent call last) <ipython-input-37-2918ee116b28> in <module>() 9 #products['awesome'] = products['word_count'].apply(lambda x: dict(x) if 'awesome' in x else 0) 10 ---> 11 products['awesome'] = products['wordcount'].apply(awesomecount) 12 13 #wordcountorig = products['word_count']

C:\Users\Gokul\AppData\Local\Dato\Dato Launcher\lib\site-packages\graphlab\datastructures\sarray.pyc in apply(self, fn, dtype, skipundefined, seed) 1624 1625 with cythoncontext(): -> 1626 return SArray(proxy=self.proxy.transform(fn, dtype, skip_undefined, seed)) 1627 1628

C:\Users\Gokul\AppData\Local\Dato\Dato Launcher\lib\site-packages\graphlab\cython\context.pyc in exit(self, exctype, excvalue, traceback) 47 if not self.showcythontrace: 48 # To hide cython trace, we re-raise from here ---> 49 raise exctype(excvalue) 50 else: 51 # To show the full trace, we do nothing and let exception propagate

RuntimeError: Runtime Exception. The system cannot find the file specified.


I am unable to proceed further without utilizing the apply() function in this instance. I have exported the products SFrame into a csv and performed the same operation in a pandas dataframe in Anaconda, and it worked (except that the dataframe lost its dict attributes as it was saved into a csv).

Could you please help?

Thanks, Gokul

Comments

User 2419 | 10/15/2015, 9:28:24 PM

I also tried to use pickler (pickle and unpickle) to save the attributes and perform the same operation on a pandas dataframe, but pickler is not available in graphlab.


User 1207 | 10/16/2015, 6:25:49 PM

Hello lokomotiv,

It seems there are several things going on here. First, as written, your lambda function won't work in an apply operation on a dictionary column, as python dictionaries do not have a count method. However, this wouldn't explain the error you are seeing. In that case, many of the SFrame / SArray operations are evaluated lazily -- many operations are strung together and evaluated when needed. This avoids a lot of intermediate copies and disk use, but it also means that sometimes errors only show up later on. It seems that some file that the lazy evaluation depended on was changed or deleted or overwritten between when you constructed the query and when you tried to evaluate it, but I can't tell from the code you've provided.

Also, the pickler is part of python, not graphlab -- see https://docs.python.org/2/library/pickle.html.

Hope that helps! -- Hoyt


User 2463 | 10/21/2015, 7:50:38 PM

Hi Lokomotiv

Hoytak is correct you can't use the cont() function. What you need to return is dict['awesome']. That will give you the dictionary definition of 'awesome'


User 2465 | 10/22/2015, 3:01:53 PM

Hi Lokomotiv,

I am facing the same problem and I have just posted a new discussion. As I mention on my post, I am pretty sure that the problem is originated in the graphlab.textanalytics.countwords method. It supposes to create a dict data type but whatever it creates doesn't behave as a dict data type.

I think I ma going to try to do using Pandas. Did you have success using Pandas to get the proper answer?

Thanks.


User 2808 | 12/13/2015, 3:12:09 AM

Hello locomotiv,

I was also stuck with the same problem you had. I solved it by following : products['wordcount'].apply(lambda row: awesomecount(row))

Thanks, Aadeshnpn