GraphLab | Introduction to SFrames

User 2 | 3/11/2014, 11:42:58 AM

        <div class="EmbeddedContent"><img src="http://graphlab.com/images/GLlogo_FU_STACKED_300.png" class="LeftAlign" /><strong>GraphLab | Introduction to SFrames</strong>
           <p>An SFrame is a tabular data structure. If you are familiar with R or the pandas python package, SFrames behave similarly to the dataframes available in those frameworks. SFrames act like a table by consisting of 0 or more columns. Each column has its own datatype and every column of a particular SFrame must have the same number of entries as the other columns that already exist.</p>
           <p><a href="http://graphlab.com/learn/notebooks/introduction_to_sframes.html">Read the full story here</a></p>
           <div class="ClearFix"></div>
        </div>

Comments

User 126 | 3/11/2014, 11:42:58 AM

Every time when I try to run next line of code I get KernelRestarted(ipython): songsf["filtered"] = songsf["title"].apply(lambda x: x if not x[0].isalnum() and not x[0].isalpha() else np.NaN)


User 15 | 3/11/2014, 7:07:00 PM

Hi Tural,

This is happening because one of the titles is an empty string, and you are trying to refer to the first character. However, this should result in an IndexError, but instead we are crashing. We're working on that issue.

Evan


User 286 | 5/2/2014, 8:49:24 PM

What does the "S" in "SFrame" (or "SArray") represent? I'm reminded of LISP S-expressions (the "S" stood for "symbolic").


User 15 | 5/3/2014, 5:09:58 PM

@gumption It stands for "Server-side", to make it clear that the data in your SFrame is always on the machine that will be doing computations on that data. Even though we haven't exactly made it known :). It doesn't make much of a difference when you run GraphLab Create on your laptop, but when you run it in the cloud, all SFrames' contents are on that cloud machine.


User 318 | 5/20/2014, 5:22:26 AM

Is it possible for SFrame.read_csv method to parse tab-delimiter files? As I noticed that the delimiter used for parsing csv files must be a single character. And I tried to use '\t' and got an exception error.


User 14 | 5/20/2014, 4:26:20 PM

You should be able to use '\t' as delimiter. Can you post the exception and the version of GraphLab Create?


User 421 | 7/3/2014, 5:28:10 PM

I was stopped at the beginning:

gl.aws.setcredentials('AKIAJMHKEZGY6YP24BXA', 'vf/miz2Zx7V7VkCai9ZeJR45ZSimqu6/W7qdRLmN') songsf = gl.SFrame.readcsv('s3://GraphLab-Datasets/millionsong/songdata.csv', ... columntypehints = {'year' : int}) PROGRESS: Downloading from GraphLab-Datasets/millionsong/songdata.csv Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/Users/pzhang/graphlab/graphlab/lib/python2.7/site-packages/graphlab/datastructures/sframe.py", line 571, in readcsv proxy.loadfromcsv(internalurl, parsingconfig, typehints) File "/Users/pzhang/graphlab/graphlab/lib/python2.7/site-packages/graphlab/cython/context.py", line 23, in exit raise exctype(excvalue) IOError: Fail to download from s3://GraphLab-Datasets/millionsong/songdata.csv. The 'get' operation for 'millionsong/songdata.csv' failed. Connection timed out after 30001 milliseconds.: unspecified iostreamcategory error: unspecified iostreamcategory error

What happens to this. I was able to pip for installing others, it is not due to proxy. Is there an access to the repository for download the .csv file?


User 628 | 8/27/2014, 4:04:50 PM

Nice overview, Thx Evan ;)


User 738 | 9/22/2014, 6:52:21 PM

I want to use the nearestneighbors toolkit described at http://graphlab.com/products/create/docs/graphlab.toolkits.nearestneighbors.html which requires a string label for each row. I see there is a 'addrownumber' method for SFrame that adds an integer ID, but can't find a way to do the same for a string, or to convert column types. It seems like this would be an obvious tool to have given the requirements of nearest_neighbors; any chance that this will be implemented?


User 6 | 9/22/2014, 6:53:36 PM

Hi Andrea, It is easy to convert using data['columnname'] = data['columnname'].astype(str)


User 2365 | 10/16/2015, 9:29:05 PM

Hi all,

When i execute the following command in ipython notebook, I get the following error. Can you please help me why? Thank you Rajesh

sf = graphlab.SFrame('people-example.csv')


NameError Traceback (most recent call last) <ipython-input-4-4df0be298ea8> in <module>() ----> 1 sf = graphlab.SFrame('people-example.csv')

NameError: name 'graphlab' is not defined


User 2365 | 10/16/2015, 9:34:59 PM

I imported graphlab before that too


User 1207 | 10/16/2015, 10:12:24 PM

If you imported graphlab before that, did you get an error?


User 2365 | 10/17/2015, 8:14:17 AM

I didn't get any error when i imported graphlab. It took some processing time until i for a new code line in Ipython Notebook


User 2514 | 10/29/2015, 8:31:26 PM

I am getting an error when defining and SFrame and I'm not sure why. I am working through week 2 of the UofW Machine learning course. The data set is from the course.

here is my code and output from the ipython notebook: import graphlab as gl #fire up GraphlabCreate sales = gl.SFrame("home_data.gl/") output: ` IOError Traceback (most recent call last) <ipython-input-3-028685c25594> in <module>() ----> 1 sales = gl.SFrame("home_data.gl/")

/Users/WTAYLO/.graphlab/anaconda/lib/python2.7/site-packages/graphlab/datastructures/sframe.pyc in init(self, data, format, proxy) 865 pass 866 else: --> 867 raise ValueError('Unknown input type: ' + format) 868 869 sframe_size = -1

/Users/WTAYLO/.graphlab/anaconda/lib/python2.7/site-packages/graphlab/cython/context.pyc in exit(self, exctype, excvalue, traceback) 47 if not self.showcythontrace: 48 # To hide cython trace, we re-raise from here ---> 49 raise exctype(excvalue) 50 else: 51 # To show the full trace, we do nothing and let exception propagate

IOError: Cannot open /Users/WTAYLO/homedata.gl for read. Cannot open /Users/WTAYLO/homedata.gl for reading: unspecified iostreamcategory error: unspecified iostreamcategory error: unspecified iostream_category error `


User 4 | 10/29/2015, 9:17:03 PM

Hi @"Wallace Taylor", can you verify that /Users/WTAYLO/home_data.gl exists, and is a directory? The Coursera data files are provided as .zip files and you'll need to extract the .zip file in order to use the data from GraphLab Create. It should extract into a directory named home_data.gl.


User 2514 | 10/29/2015, 10:50:38 PM

Yes. I extracted files to /Users/WTAYLO/home_data.gl .. the files from the .zip are there.


User 4 | 10/29/2015, 11:10:18 PM

Can you check the permissions on that directory? It's possible that the user the Python process is running as does not have read permission on the directory.

Another common issue is that some extractors will create an extra level of directory structure. Please also verify there are not two levels of directories for homedata.gl -- the homedata.gl directory should contain only data files (cryptically-looking named files ending in either frameidx, or 001, 002, something like that). If it contains another directory home_data.gl inside it, try moving that one up a level (so there are not two levels of homedata.gl).


User 2514 | 10/30/2015, 4:55:35 PM

So, not sure why I didn't try this first, but I used the fully qualified path ('/Users/WTAYLO/Documents/home_data.gl') and the data loads fine now. Thanks for your help Zach.