Provide additional options to SFrame.save

User 1487 | 3/31/2015, 8:19:04 PM

Currently it appears to be impossible to get SFrame data exported in a delimited format different from proper CSV. Specifically, it would be very helpful if the API allowed one to <ul> <li> specify delimiter (e.g. tab)</li> <li> control use of quotes / escaping </li> </ul> I'm faced with this problem and using Pandas DataFrame as intermediary appears to be the only viable route, but at the cost of data duplication in memory.

I tried to generate tab-separated lines with SFrame.apply, but saving result SArray introduced undesirable quotes around each line and all tabs were escaped into '\t'.

Comments

User 1207 | 3/31/2015, 9:20:11 PM

Hello Eric239,

We're looking into adding this functionality. In the mean time, it's easy to wrap the python csv writer to avoid the data duplication (although it's possibly slower). This way, everything is done with iterators and it simply gets streamed into the file.

For example, the following would write out an SFrame X, and the <a href="https://docs.python.org/2/library/csv.html">csv.DictWriter</a> class supports numerous options for how to format the output.

<pre class="CodeBlock"><code>import csv

with open('out.csv', 'wb') as outfile:

writer = csv.DictWriter(outfile, fieldnames=X.column_names())

writer.writeheader()

for d in X:
    writer.writerow(d)</code></pre>