Is it possible to create a 22GB sparse matrix using GraphLab Create in Python?

User 2788 | 12/11/2015, 11:54:06 AM

Hello everyone!

I am handling a sparse matrix problem lately with Python. I have a categorical variable within which exists about 5000 unique values.

Based on this, I want to create a sparse matrix by extracting all these 5000 features. In my 8G memory machine, it is impossible for Pandas to do it (get_dummy). So, is it possible to make it using SFrame and Create Graph Lab?

If possible, how to do that? If not, any suggestions?

Sincerely, Zach

Comments

User 2788 | 12/11/2015, 2:55:13 PM

Help! :s :s :s :s :s


User 954 | 12/11/2015, 6:35:03 PM

Hello Zach,

GraphLab Create can definitely help you. You can use SFrame data structure to store categorical variables. SFrame is constrained by your disk size NOT the memory. SFrame also has natural support for dictionary type which is good for categorical variables.

please look at this reference: https://dato.com/products/create/docs/generated/graphlab.SFrame.html

if you still have problem, send us data schema and we can further help you. Emad


User 2788 | 12/11/2015, 7:47:04 PM

Thank you! I will try and if I have problem, I will let you know. :) :)