Pipe support for SFrame.read_csv

User 1191 | 1/28/2015, 10:20:37 PM

Hello,

I was wondering if it is possible to replicate Pandas readcsv capability in SFrame to read from subprocess pipes. Specifically, I have a custom data stream that I am reading via : <code class="CodeInline">pipe = subprocess.Popen( myreader_cmd, bufsize=-1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)</code> and I want to feed this pipe to creating an SFrame.

This works in pandas as: <code class="CodeInline">df = pandas.read_csv(pipe.stdout) #use pandas to read csv automatically</code>

but fails using SFrame: <pre class="CodeBlock"><code> sf = SFrame.read_csv(pipe.stdout) #use SFrame to read csv </code></pre>with the following error:

<pre class="CodeBlock"><code>AttributeError Traceback (most recent call last) <ipython-input-23-42a674af6f09> in <module>() 6 xx.other_args 7 ----> 8 xx.read_sf( 20010101, 20010110)

/home/firdaus/hsfirdaus/projects/python/fjpy/dcat.py in readsf(self, begin, end) 98 cmd = self.__builddcatcmd__(begin, end); 99 pipe = subprocess.Popen(cmd, bufsize=-1, stdout=subprocess.PIPE, stderr=subprocess.PIPE) --> 100 sf = SFrame.readcsv(pipe.stdout) #use pandas to read csv automatically 101 return sf 102

/home/firdaus/localinstall/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in readcsv(cls, url, delimiter, header, errorbadlines, commentchar, escapechar, doublequote, quotechar, skipinitialspace, columntypehints, navalues, nrows, verbose) 992 nrows=nrows, 993 verbose=verbose, --> 994 storeerrors=False)[0] 995 996

/home/firdaus/localinstall/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in readcsvimpl(cls, url, delimiter, header, errorbadlines, commentchar, escapechar, doublequote, quotechar, skipinitialspace, columntypehints, navalues, nrows, verbose, storeerrors) 541 542 proxy = UnitySFrameProxy(glconnect.getclient()) --> 543 internalurl = makeinternalurl(url) 544 545 if (not verbose):

/home/firdaus/localinstall/lib/python2.7/site-packages/graphlab/util.pyc in makeinternalurl(url) 80 81 # Try to split the url into (protocol, path). ---> 82 urlsplit = url.split("://") 83 if len(urlsplit) == 2: 84 protocol, path = urlsplit

AttributeError: 'file' object has no attribute 'split'</code></pre>

Thanks ! -firdaus

Comments

User 1189 | 1/29/2015, 7:32:46 PM

Hi,

We don't support this right now. There are some challenges since the actual read is performed by another C++ binary in the background. But I think it might be possible. I will look into supporting this.

Thanks! Yucheng