SFrame.from_rdd(bagRdd) only create one column?

User 630 | 2/18/2015, 1:54:33 AM

I am try out the example code line-by-line, http://dato.com/learn/gallery/notebooks/sparkandgraphlabcreate.html#Step-4:-Learn-Topic-Model. However I see the different outcome after step " data = gl.SFrame.fromrdd(bagRdd) ", all data are pared into one column "X1", shown as below. Do any of you know what is the problem or any suggestions? Thanks.

Regards, Wenfeng


In [133]: data = gl.SFrame.fromrdd(bagRdd) In [135]: data.columnnames() Out[135]: ['X1'] In [137]: gl.VERSION Out[137]: '1.3.0' In [139]: data[0] Out[139]: {'X1': [0, {'1970s': 1, '1980s': 1, '1982': 1, '2001': 1, '2004': 1, 'a': 2, 'academies': 1, 'academy': 5, 'alain': 1, 'alainconnes': 1, 'algebras': 2,

In [140]: bagRdd.take(1) Out[140]: [(0, {u'1970s': 1, u'1980s': 1, u'1982': 1, u'2001': 1, u'2004': 1, u'a': 2, u'academies': 1, u'academy': 5, u'alain': 1, u'alainconnes': 1, u'algebras': 2, ...

Comments

User 16 | 2/18/2015, 7:36:44 PM

You're right there is a bug in that example. Sorry about that.

Cell #13 should read: data = gl.SFrame.fromrdd(bagRdd) data = data.unpack('X1') data.rename({'X1.0':'id','X1.1':'bagof_words'})

I will update the website shortly.

Thanks, Toby


User 630 | 2/18/2015, 7:39:24 PM

Thanks, I just find same solution: data=data.unpack("X1").rename({"X1.0":"id", "X1.1":"bagofwords"})

Wenfeng


User 630 | 2/18/2015, 7:48:01 PM

Is it possible to update fromrdd() to gl.SFrame.fromrdd(bagRdd, columns=["id", "bagofwords"]) ?

Wenfeng


User 954 | 2/18/2015, 7:55:59 PM

H Wenfeng,

In fact, this is due to a change in rdd->sframe conversion in glc 1.3. In glc1.3 an rdd always translate to an sframe with one column.

The notebook is using glc 1.2.1 and that is why you see multiple columns sframe. As Toby mentioned, data = data.unpack('X1') will do the trick.

Sorry for the inconvenience.


User 630 | 2/18/2015, 9:13:38 PM

I see. Thanks for explanations.


User 954 | 2/18/2015, 11:13:09 PM

Just to add to the previous comment:

SchemaRdd <-> SFrame conversion handles multiple columns translation. It means an SchemaRdd with multiple columns translates to an SFrame with multiple columns, and vice versa.