Error loading an SFrame

User 2077 | 7/10/2015, 3:51:06 AM

Hi Dato!

I am using GLC 1.3.0. I wrote a script that saved an SFrame, and as far as I can tell, it seems to have successfully completed. But then I when I go to load the SFrame, I get the following error:

`python In [2]: boardgamessf = gl.loadsframe("boardgames.sframe") [INFO] Start server at: ipc:///tmp/graphlabserver-16579 - Server binary: /Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/unityserver - Server log: /tmp/graphlabserver1436500025.log [INFO] GraphLab Server Version: 1.3.0


RuntimeError Traceback (most recent call last) <ipython-input-2-901d77bf5152> in <module>() ----> 1 boardgamessf = gl.loadsframe("boardgames.sframe")

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in loadsframe(filename) 228 >>> sfloaded = graphlab.loadsframe('my_sframe') 229 """ --> 230 sf = SFrame(data=filename) 231 return sf 232

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in init(self, data, format, proxy) 858 pass 859 else: --> 860 raise ValueError('Unknown input type: ' + format) 861 862 sframe_size = -1

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/cython/context.pyc in exit(self, exctype, excvalue, traceback) 37 def exit(self, exctype, excvalue, traceback): 38 if not self.showcythontrace and exctype: ---> 39 raise exctype(exc_value)

RuntimeError: Runtime Exception. Unable to parse frame index file /Users/rlvoyer/Code/bgg++/data/boardgames.sframe/mb663f3fcef71d8eb.frameidx `

In the server log, I see same error about it being unable to parse the frame index file. I tried loading the SFrame in 1.4.1 and I get the same error. Any ideas? Thanks!

Robert

Comments

User 1178 | 7/10/2015, 6:49:10 PM

Hi Robert!

How are you!

Does a simple SFrame save/load work for you in your installment? Can you check the file under /Users/rlvoyer/Code/bgg++/data/boardgames.sframe/mb663f3fcef71d8eb.frameidx and see if it has the right format?

It looks something like:

[sframe] version=0 numsegments=0 numcolumns=2 nrows=100 [column_names] 0000=a 0001=b [columnfiles] 0000=m3f4d7288ac9af1b0.sidx:0 0001=m_3f4d7288ac9af1b0.sidx:1

Thanks! Ping


User 2077 | 7/11/2015, 6:15:40 PM

Hi Ping!

I'm good! How are you?

I figured it out! First, here's some more evidence that the SFrame at least appears to be well-formed, but fails to load using GLC 1.4.1.

`python In [3]: len(boardgame_sframe) Out[3]: 175216

In [4]: boardgame_sframe.save("/Users/rlvoyer/Code/bgg/data/boardgames.sframe")

In [5]: type(boardgamesframe) Out[5]: graphlab.datastructures.sframe.SFrame `

After saving in one Python shell, I immediately retried opening in another Python shell, and it failed to load with the same error:

`python In [2]: boardgamessf = gl.loadsframe("/Users/rlvoyer/Code/bgg/data/boardgames.sframe")


RuntimeError Traceback (most recent call last) <ipython-input-2-859261cafe0d> in <module>() ----> 1 boardgamessf = gl.loadsframe("/Users/rlvoyer/Code/bgg/data/boardgames.sframe")

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in loadsframe(filename) 220 >>> sfloaded = graphlab.loadsframe('my_sframe') 221 """ --> 222 sf = SFrame(data=filename) 223 return sf 224

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in init(self, data, format, proxy) 850 pass 851 else: --> 852 raise ValueError('Unknown input type: ' + format) 853 854 sframe_size = -1

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/cython/context.pyc in exit(self, exctype, excvalue, traceback) 29 def exit(self, exctype, excvalue, traceback): 30 if not self.showcythontrace and exctype: ---> 31 raise exctype(exc_value)

RuntimeError: Runtime Exception. Unable to parse frame index file /Users/rlvoyer/Code/bgg/data/boardgames.sframe/mc2fddd152372df8d.frameidx `

Per your suggestion, I confirmed that I can save and load a much simpler and smaller SFrame:

`python In [3]: sf = gl.SFrame({"foo": range(10), "bar": range(10)})

In [4]: sf.save("/Users/rlvoyer/Desktop/test.sframe")

In [5]: gl.load_sframe("/Users/rlvoyer/Desktop/test.sframe") Out[5]: Columns: bar int foo int

Rows: 10

Data: +-----+-----+ | bar | foo | +-----+-----+ | 0 | 0 | | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | | 7 | 7 | | 8 | 8 | | 9 | 9 | +-----+-----+ [10 rows x 2 columns] ` I looked at the index file for my boardgames SFrame and noticed that one of the columns had a very long name with lots of control characters in it. (It was extracted from JSON via an unpack operation). After renaming the crazy-named column to something sane, everything works as expected. So thanks for pointing me in the right direction!

You guys should probably add some validation on the save side to handle this kind of thing, since it's definitely unexpected to not be able to load an SFrame that I saved successfully.


User 1178 | 7/14/2015, 6:17:52 PM

Hi Robert,

Glad your issue is resolved!

Do you mind sharing with us the crazy long column name(maybe with some obfuscation)? We will definitely add validation here!

Thanks! Ping


User 2077 | 7/18/2015, 4:01:05 AM

Hi Ping,

As a small test, I created an SFrame w/ some newlines in the column name. Lo and behold, I am able to save it successfully, but not able to reload it. Check it out:

`python In [76]: sf = gl.SFrame({"this\n\nhas\n\nnewlines": range(10)})

In [77]: sf Out[77]: Columns: this

has

newlines int

Rows: 10

Data: +-------------------------+ | this\n\nhas\n\nnewlines | +-------------------------+ | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | +-------------------------+ [10 rows x 1 columns]

In [78]: sf.save("/tmp/test.sframe")

In [79]: gl.load_sframe("/tmp/test.sframe")

RuntimeError Traceback (most recent call last) <ipython-input-79-9f313bd2da91> in <module>() ----> 1 gl.load_sframe("/tmp/test.sframe")

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in loadsframe(filename) 220 >>> sfloaded = graphlab.loadsframe('my_sframe') 221 """ --> 222 sf = SFrame(data=filename) 223 return sf 224

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in init(self, data, format, proxy) 850 pass 851 else: --> 852 raise ValueError('Unknown input type: ' + format) 853 854 sframe_size = -1

/Users/rlvoyer/Envs/bgg/lib/python2.7/site-packages/graphlab/cython/context.pyc in exit(self, exctype, excvalue, traceback) 29 def exit(self, exctype, excvalue, traceback): 30 if not self.showcythontrace and exctype: ---> 31 raise exctype(exc_value)

RuntimeError: Runtime Exception. Unable to parse frame index file /tmp/test.sframe/m85318a0e12b1dbba.frameidx `


User 1178 | 7/20/2015, 12:13:16 AM

Thanks Robert! We will fix it in coming releases!

Ping


User 875 | 7/21/2015, 12:02:14 PM

Sorry, the error is eliminated. It was my false.