Getting problem with loading csv file into graphlab ?

User 4834 | 4/15/2016, 9:04:29 PM

Hi I try to load csv file into graphlab but I got error. I open csv file without any problem that I could open with R too.

Any idea about this issue

Comments

User 4 | 4/15/2016, 9:35:34 PM

Hi @Tri,

There is not really such a thing as "standard" CSV and there are incompatibilities across all systems that deal with CSV as a result. While we strive to support as many CSV formats as possible, there are some cases that aren't currently handled by our CSV parser. It's hard to tell which case this falls into. If you can provide a data sample that will trigger the parse error I can try to find a workaround for you.

In the meantime, you could try using another CSV parser in Python (for instance, the [pandas readcsv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.readcsv.html)) that may have better results with this file. Note that this will only help if the data is small enough to fit into RAM; Pandas doesn't have the ability to read CSVs larger than RAM like GraphLab Create.


User 4834 | 4/15/2016, 11:42:54 PM

Hi Zach Thanks for your kind help. I could not attached file in this post (8KB only) even I could open. It is weird.


User 4834 | 4/15/2016, 11:46:13 PM

I tried graphlab.SFrame.read_csv. It gives errors.

As I could not attach file, I take a screenshot and attach the image


User 4 | 4/18/2016, 1:18:27 AM

Hi @Tri, for a file that small, you should have better luck with the pandas read_csv parser (once you have a Pandas dataframe, you can convert it to an SFrame using the SFrame constructor: sf = gl.SFrame(df)). Please let me know if that method works better for you.

I would be happy to investigate further and see if we can make SFrame's CSV parser compatible with this file but I will need the actual CSV file and not a screenshot. If you would like to help us support this type of CSV file, please open a GitHub issue in the SFrame issues since this is an open source project and we're tracking issues publicly there. You should be able to attach the CSV file there (you may need to zip it first) -- sorry this format doesn't support attaching that type of file.


User 5151 | 4/26/2016, 8:18:33 AM

Hi Zach, I came across a similar problem, where I'm trying to load a file with 800,000 lines and Dato can only parse 200000 results, and this line, among many others, failed to export ` ObjectId(570af7a06244ad4777000e0f),Mon Apr 11 01:02:24 +0000 2016,719329719976472576,719329719976472576,RT @realDonaldTrump: The people of Colorado had their vote taken away from them by the phony politicians. Biggest story in politics. This w…,"<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>",false,,,,,,"{""id"":488578718,""idstr"":""488578718"",""name"":""Bill"",""screenname"":""tomservo10"",""location"":""Phoenix, AZ"",""url"":null,""description"":""Current events. Some politics. I live Tweet film and television. Hashtag games."",""protected"":false,""verified"":false,""followerscount"":4719,""friendscount"":5188,""listedcount"":197,""favouritescount"":135956,""statusescount"":101877,""createdat"":""Fri Feb 10 16:14:14 +0000 2012"",""utcoffset"":-25200,""timezone"":""Arizona"",""geoenabled"":false,""lang"":""en"",""contributorsenabled"":false,""istranslator"":false,""profilebackgroundcolor"":""131516"",""profilebackgroundimageurl"":""http://pbs.twimg.com/profilebackgroundimages/532950858128838658/fkMV4yK.jpeg"",""profilebackgroundimageurlhttps"":""https://pbs.twimg.com/profilebackgroundimages/532950858128838658/fkMV4yK.jpeg"",""profilebackgroundtile"":true,""profilelinkcolor"":""009999"",""profilesidebarbordercolor"":""FFFFFF"",""profilesidebarfillcolor"":""EFEFEF"",""profiletextcolor"":""333333"",""profileusebackgroundimage"":false,""profileimageurl"":""http://pbs.twimg.com/profileimages/680873072186937344/H1OUZb83normal.jpg"",""profileimageurlhttps"":""https://pbs.twimg.com/profileimages/680873072186937344/H1OUZb83normal.jpg"",""profilebannerurl"":""https://pbs.twimg.com/profilebanners/488578718/1458784552"",""defaultprofile"":false,""defaultprofileimage"":false,""following"":null,""followrequestsent"":null,""notifications"":null}",,,,,"{""createdat"":""Mon Apr 11 00:50:56 +0000 2016"",""id"":{""$numberLong"":""719326834538778625""},""idstr"":""719326834538778625"",""text"":""The people of Colorado had their vote taken away from them by the phony politicians. Biggest story in politics. This will not be allowed!"",""source"":""\u003ca href=\""http://twitter.com/download/android\"" rel=\""nofollow\""\u003eTwitter for Android\u003c/a\u003e"",""truncated"":false,""inreplytostatusid"":null,""inreplytostatusidstr"":null,""inreplytouserid"":null,""inreplytouseridstr"":null,""inreplytoscreenname"":null,""user"":{""id"":25073877,""idstr"":""25073877"",""name"":""Donald J. Trump"",""screenname"":""realDonaldTrump"",""location"":""New York, NY"",""url"":""http://www.DonaldJTrump.com"",""description"":""#MakeAmericaGreatAgain #Trump2016"",""protected"":false,""verified"":true,""followerscount"":7538527,""friendscount"":41,""listedcount"":32550,""favouritescount"":80,""statusescount"":31615,""createdat"":""Wed Mar 18 13:46:38 +0000 2009"",""utcoffset"":-14400,""timezone"":""Eastern Time (US \u0026 Canada)"",""geoenabled"":true,""lang"":""en"",""contributorsenabled"":false,""istranslator"":false,""profilebackgroundcolor"":""6D5C18"",""profilebackgroundimageurl"":""http://pbs.twimg.com/profilebackgroundimages/530021613/trumpscotland__43of70cc.jpg"",""profilebackgroundimageurlhttps"":""https://pbs.twimg.com/profilebackgroundimages/530021613/trumpscotland__43of70cc.jpg"",""profilebackgroundtile"":true,""profilelinkcolor"":""0D5B73"",""profilesidebarbordercolor"":""BDDCAD"",""profilesidebarfillcolor"":""C5CEC0"",""profiletextcolor"":""333333"",""profileusebackgroundimage"":true,""profileimageurl"":""http://pbs.twimg.com/profileimages/1980294624/DJTHeadshotV2normal.jpg"",""profileimageurlhttps"":""https://pbs.twimg.com/profileimages/1980294624/DJTHeadshotV2normal.jpg"",""profilebannerurl"":""https://pbs.twimg.com/profilebanners/25073877/1460170262"",""defaultprofile"":false,""dHTTP/1.1 200 OK Transfer-Encoding: chunked Date: Thu, 21 Jul 2016 23:13:36 GMT Server: Warp/3.2.6 Content-Type: application/json

016A ["37zyefqi2sweveyp","42fn7zeo6v5ui427","66pt5sk2wz2jrbzu","awoljknjigytdyls","cj2lanoogknwopto","cnm3adnh35xmsx3f","ebxs4t2y6xr5izzy","eg5zus2pz72mr7xb","exshwew2w2jv3n7r","hxrxgzvgms3incmf","hymu5oh2f5ctk5jr","jkisbjnul226jria","lag7djeljbjng6bu","o3l65o4qzcxs327j","qsk2jzo2zh523r24","t7k6g7fkndoggutd","xfllvjyax4inadxh","ygtjzi2wkfonj3z7","yycjajwpguyno4je"] 0


User 5151 | 4/26/2016, 8:42:11 AM

And when I finished pandas import, using sf = gl.SFrame(df) produced error : TypeError: A common type cannot be infered from types float, string.


User 4 | 4/26/2016, 7:44:30 PM

Hi @allenzhao,

I believe the two issues are separate and unrelated.

The first issue is that SFrame read_csv does not behave the same way as any other CSV parser (and in fact no two CSV parsers really behave the same way) so you'll always get slightly different results with different parsers, since there isn't really a standard they can all conform to. If you can track down the specific parsing error in this sample you're welcome to submit a pull request to SFrame, or put your data in another format that is well-specified (such as a PostgreSQL database) so that SFrame can read it more reliably. Or, as a workaround, use another parser like Pandas' CSV parser.

The second issue is that Pandas dataframes appear to allow heterogeneous columns (the same column can contain multiple types of data) while SFrame can only contain homogenous columns (one column must contain only one type of data). I think you can work around it by converting the Pandas dataframe columns to a single type (you'll have to experiment a bit to see which column is heterogenous):

`

this is what my test dataframe looks like

In [18]: df Out[18]: 0 1 2 0 1 2 3 1 4 5 hello

inspect the dataframe to see column types

In [17]: df.dtypes Out[17]: 0 int64 1 int64 2 object dtype: object

"object" dtype is not trivially convertible to SFrame (this is a mixed column type in this case)

convert that column to String, then convert the dataframe to SFrame

the name of the affected column is "2"

In [30]: df[2] = df[2].astype(str) In [33]: sframe.SFrame(df) Out[33]: Columns: 0 int 1 int 2 str

Rows: 2

Data: +---+---+-------+ | 0 | 1 | 2 | +---+---+-------+ | 1 | 2 | 3 | | 4 | 5 | hello | +---+---+-------+ [2 rows x 3 columns] `


User 5151 | 4/27/2016, 12:41:17 AM

Thanks so much @Zach !


User 5151 | 4/27/2016, 3:54:53 AM

@Zach, for my case, .astype(str) won't work. I tried df.applymap(str) to apply str to every cell, and then it worked.