SVD/PCA in graphlab create

User 910 | 1/5/2015, 6:25:45 PM

How does one do SVD e.g. for latent semantic analysis, in graphlab create?


User 954 | 1/6/2015, 2:48:40 AM


Thank you for contacting us. SVD/PCA is not currently supported in GraphLab create. But it is in our roadmap. We are strongly prioritizing our implementation based on the customer feature requests.

Regarding LSA, please take a look at our <a href="">text processing toolkit</a> (topic modeling section) and our <a href="">matrix factorization model </a> in the recommender toolkit. It might help you in this application.

I hope it helps.

User 2568 | 2/15/2016, 8:46:50 PM

Andy It's been a year since this post, any timeframe for PCA?

User 12 | 2/15/2016, 9:53:58 PM

I can't commit to a release date, but I'm actively working on dimension reduction tools this month. - Brian

User 2568 | 2/16/2016, 1:34:30 AM

That's good to know. If you needed testers, I might be able to help.

The tools you are working on, do they work with Nominal Categorical Data? My current data set has 5 nominal categorical features? Three are high cardinality, ie., 990 classes, 330 and 49.

Any suggestions on how to do PCA like analysis or dimensional reduction on these?

User 12 | 2/16/2016, 6:22:10 PM

I guess my first question is what your ultimate task is... if the goal is to do regression or classification, the models in GL Create should automatically handle those categorical features without any manual encoding or dimension reduction. If the goal is to actually materialize a dense design matrix, then maybe the feature hashing transformer could work? The canonical use case is encoding bag-of-words representations of text documents, but it could work for your case as well.

Thanks, Brian

User 2568 | 2/16/2016, 8:15:03 PM

I'm doing this for a Kaggle competition (Telstra Network Disruption), which I'm using as a tool to drive my expertise in ML.

I've used a boosted tree classifier, which after exhaustive experimentation with feature engineering, gets me a leader board score of 0.51 (log-loss).

However, the leaders have a score of around 0.42, so clearly there is something major that I'm missing and I'm now casting around for ideas to explore. I'll try out feature hashing later today.

User 5337 | 6/29/2016, 12:48:33 AM

We need this feature as well. Like in SKLearn, we want to use PCA analysis to visualize our data for exploration. So the suggestion that GLab Create algorithms work without dimensionality doesn't apply here, since we want to see how our dataset looks like after the PCA analysis.

Is this feature on the roadmap?

User 12 | 6/29/2016, 6:36:18 PM

Hi Shahar, We do plan to implement PCA but I can't give you a date at this point, to be totally honest. In the meantime, is PCA your preferred tool for dimension reduction for visualization, or would you also consider t-SNE (or something else entirely)?

Thanks, Brian