Character Encoding

User 922 | 12/5/2014, 8:44:41 PM

Hello, So I am reading in data from a file that is encoded in UTF8, that contains non latin characters. The display in show is showing the dreaded ? for those characters. Inside the frame it is a hex value. Any idea how to show the proper, character for those bytes.

Comments

User 4 | 12/5/2014, 11:41:44 PM

Hi @wrivers‌, this could be due to the way that SFrames store string values (as raw bytes), and the Python side is unaware of the underlying encoding. I will be happy to help investigate. Can you provide a sample of data that exhibits this issue? Thanks!


User 922 | 12/8/2014, 3:36:22 PM

Well for example when I tried loading this map, {'ééééé':['éééééé']}, the key displays fine but the value displays as ? marks.


User 4 | 12/8/2014, 7:04:01 PM

Thanks @wrivers‌. This does appear to be a bug in how GraphLab Canvas is handling utf-8 encoded string values. I will follow up using the data you provided and make sure this is fixed in the next release of GraphLab Create.


User 2355 | 10/3/2015, 7:10:32 AM

More basics about......Character Encoding

Biden


User 940 | 10/6/2015, 5:28:53 PM

@nicolbiden ,

Thanks for the pointer!

Cheers! -Piotr