ERROR: (operator():157): number of tokens in parsed column does not match with the sframe number

User 1220 | 1/21/2015, 11:17:08 PM

t = sc.textFile("LICENSE") sf = gl.SFrame.from_rdd(t) 15/01/21 18:10:29 INFO HadoopRDD: Input split: file:/Users/cubreto/Downloads/spark-1.2.0/LICENSE:0+22621 15/01/21 18:10:29 INFO HadoopRDD: Input split: file:/Users/cubreto/Downloads/spark-1.2.0/LICENSE:22621+22621 1421881829 : ERROR: (operator():157): number of tokens in parsed column does not match with the sframe number of columns libc++abi.dylib: terminating with uncaught exception of type char const* 15/01/21 18:10:29 INFO Executor: Finished task 1.0 in stage 5.0 (TID 9). 1799 bytes result sent to driver

Comments

User 15 | 1/22/2015, 1:37:12 AM

Hi,

Thanks for bringing this to our attention! What's happening here is that we're not handling empty lines in the middle of the file correctly. We'll get this fixed.

In the meantime, if you need this to work now (I'm guessing not since reading the license is in Spark's tutorial, but perhaps you'd run into a similar situation later), you could do this:

<pre> t = sc.textFile("LICENSE") t.filter(lambda x: True if len(x) > 0 else False)

or t.map(lambda x: 'NA' if len(x) == 0 else x) to preserve the empty strings

sf = gl.SFrame.from_rdd(t) </pre>

Evan


User 15 | 2/5/2015, 5:25:23 PM

Hi,

Just to let you know, this has been fixed for the next release of GraphLab Create.