error when loading csv

User 2487 | 10/26/2015, 3:58:30 PM

Canvas is accessible via web browser at the URL: http://localhost:58962/index.html Opening Canvas in default web browser. [ERROR] GraphLab Canvas: (<type 'exceptions.UnicodeDecodeError'>, UnicodeDecodeError('utf8', 'Date;Licence;Group;UID;First Name;Last Name;Gender;Birthdate;Time (s);Note;Angle Front (\xf8);Angle Back (\xf8);Angle Left (\xf8);Angle Right (\xf8);Note;Power Beginning (W per kg);Power End (W per kg);Difference (%));Note Explosivity;Note Resistance;Stability left (s);Note Stability;Total points;Speed Begin;Speed End', 88, 89, 'invalid start byte'), <traceback object at 0x0000000019C93148>) [ERROR] GraphLab Canvas: (<type 'exceptions.UnicodeDecodeError'>, UnicodeDecodeError('utf8', 'Date;Licence;Group;UID;First Name;Last Name;Gender;Birthdate;Time (s);Note;Angle Front (\xf8);Angle Back (\xf8);Angle Left (\xf8);Angle Right (\xf8);Note;Power Beginning (W per kg);Power End (W per kg);Difference (%));Note Explosivity;Note Resistance;Stability left (s);Note Stability;Total points;Speed Begin;Speed End', 88, 89, 'invalid start byte'), <traceback object at 0x0000000019CBE548>)

Comments

User 4 | 10/26/2015, 9:10:54 PM

Hi @savioz, thanks for letting us know! This is a known issue with string handling in GraphLab Create in general, as well as GraphLab Canvas in particular. The root cause is that the str type in Python does not imply any particular string encoding, but to display strings on the screen, we have to assume an encoding. Since utf-8 is the predominant encoding these days, we assume utf-8 encoding in a few places like Canvas, but because we don't have any enforcement of that encoding (or conversion to that encoding) underneath, if the raw data input is in a different encoding, it will hit this error.

The workaround is to make sure all of your input data (whether it is CSV, TSV, etc.) is in UTF-8 encoding before reading it into SFrame.


User 2487 | 10/27/2015, 8:21:46 AM

not good, it should at least not crash. Why not showing message when the input data is not UTF-8 ?


User 4 | 10/27/2015, 6:58:01 PM

Hi @savioz -- sorry about that, the crash is definitely a bug we intend to fix. Thanks for the suggestion, we will consider showing a warning message when the input is not valid UTF-8 in a future release!