Large number of columns

User 781 | 2/18/2015, 9:59:29 AM

We're developing a complex recommendation application with a large dataset and large number of features. Has the 200 column limit been lifted in the latest release?

Comments

User 91 | 2/18/2015, 2:27:48 PM

You should be able to use SFrames with hundreds of columns without any issues.

The column limit was the limitation in a very early release of GraphLab Create. That is no longer the case. Our release notes should give you a good sense of all the features that have been added since (http://dato.com/products/create/upgrade/index.html).


User 1375 | 3/5/2015, 7:57:10 PM

When training a canonical bag-of-words text classifier in Graphlab Create, the feature space is potentially in the hundreds of thousands (English has ~170K distinct words). How does the SFrame represent these features internally (sparsely, I presume), and is the "hundreds of columns" limitation relevant in this context?


User 91 | 3/5/2015, 10:06:26 PM

Just to add to Brian's comments. I don't think there should be any issuer with 170K words. I have worked on datasets with millions of words and it should work just fine. Supporting sparse data in all our algorithms is one of our fundamental requirements.

Here are some resources to get you going:

Text analytics 101 with GLC: https://dato.com/learn/userguide/#ModelingdataText_analysis

A notebook on sentiment classifiers: https://dato.com/learn/gallery/notebooks/sentiment_classifier.html

Regression analysis on yelp with text data: https://dato.com/learn/gallery/notebooks/intro-regression.html

You should have no issues working with sparse data. Let us know if you need more pointers.