SFrame.Apply is not working - graphlab.text_analytics.count_words creating a corrupted dict type

User 2465 | 10/22/2015, 2:52:31 PM

Hi,

I have read a similar discussion below. It seems that the SFrame.Apply is not working because the dictionary data type column that is created by the graphlab.textanalytics.countwords is corrupted.

Here is my code: ` products['wordcount'] = graphlab.textanalytics.count_words(products['review'])

products['word_count'].head()

dtype: dict Rows: 10 [{'and': 5L, 'stink': 1L, 'because': 1L, 'ordered': 1L, 'just': 1L, 'had': 2L, 'wipes-ocean': 1L, 'hands': 1L, 'wipes,': 1L, 'replace': 1L, 'fab': 1L, 'softer': 1L, 'are': 3L, 'have': 2L, 'in': 1L, 'need': 1L, 'been': 1L, 'rough': 1L, 'ok,': 1L, 'issues': 1L, 'seemed': 1L, 'use': 1L, 'blue-1

products['word_count'].keys()


AttributeError Traceback (most recent call last) <ipython-input-17-303bca64ab9a> in <module>() ----> 1 products['word_count'].keys()

AttributeError: 'SArray' object has no attribute 'keys'

rr = products['word_count']

rr['and']


IndexError Traceback (most recent call last) <ipython-input-23-3d2ae5d31212> in <module>() ----> 1 rr['and']

C:\Anaconda\lib\site-packages\graphlab\datastructures\sarray.pyc in getitem(self, other) 1001 return SArray(proxy = self.proxy.copy_range(start, step, stop)) 1002 else: -> 1003 raise IndexError("Invalid type to use for indexing") 1004 1005 def materialize(self):

IndexError: Invalid type to use for indexing `

Comments

User 4 | 10/29/2015, 6:25:04 AM

Hi @mcetraro, I think there is some confusion about the types of these variables:

products is an SFrame -- a data table organized into columns (SArrays). products['word_count'] is an SArray -- a single column organized into rows of values (in this case they are dict values).

As such, products['word_count'] is not expected to have methods like keys or to have string-based indexing. It's actually an SArray, containing 10 dictionaries.

If you want to see the keys or the 'and' value of a single row, you could do this: products['word_count'][0].keys() # returns keys of the first row products['word_count'][0]['and'] # returns the value of the 'and' key of the first row