export_csv is very slow

User 5189 | 5/12/2016, 10:01:56 AM

Has anyone had problem with export_csv or save? I was trying to save 10 * 10^6 line and was slow. Then I splitted file in 10 k chunks(with my C code in less then a second). But still it can't write export sframe, looks like it's stuck on something :-(


User 12 | 5/12/2016, 6:43:20 PM

Hi @gguliash, Sorry to hear about the issue. Just to confirm, you're trying to save an SFrame with 10^7 rows, but it's slow, so you split the SFrame into 10,000 chunks, and the chunks won't save correctly - is that a correct understanding of the situation?

Assuming I'm reading you correctly, can you provide a little more context, either on this public thread, or by email to Dato's support email address? In particular, how big is each row? How many columns, are any of the columns large strings, lists, dicts, etc? What kind of OS and GLC version are you running? Once the SFrame is split into chunks, do all of the chunks fail to save, or just some of them?

Thanks, Brian

User 5189 | 5/13/2016, 6:24:02 AM

Thanks @brian for reply. Seems like Dato was not computing things, until it had to write them(lazy SFrame :smile: ). So it was not export_csv problem actually, it just had to do extensive computing.

User 12 | 5/13/2016, 4:11:16 PM

Ah, the lazy evaluation does have a tendency to do that. I'm glad it's sorted out now. -Brian