How to make movielens-100k available in GraphChi's rbm-cf

User 1205 | 1/16/2015, 2:48:13 PM

Hi,

I have passed the rbm-cf test case by the dataset of smallnetflix, now I am trying to apply it on the dataset of movielens-100k.

I have parsed the training set and test set, and their matrix matrix are "943 1650 80000" and "459 1410 20000". Then I applied rbm-cf as follows: allen@allen-virtual-machine:~/Downloads/graphchi-cpp-master$ ./toolkits/collaborativefiltering/rbm --training=u1base --validation=u1test --minval=1 --maxval=5 --maxiter=6 --quiet=0 membudgetmb=6000 --D=100 --rbmalpha=0.01 --rbm_beta=0.02

The results: [training] => [u1base] [validation] => [u1test] [minval] => [1] [maxval] => [5] [max_iter] => [6] [quiet] => [0] [D] => [100] [rbm_alpha] => [0.01] [rbm_beta] => [0.02] INFO: sharder.hpp(startpreprocessing:370): Starting preprocessing, shovel size: 17476266 INFO: io.hpp(computematrixsize:136): Starting to read matrix-market input. Matrix dimensions: 943 x 1650, non-zeros: 80000 FATAL: io.hpp(convertmatrixmarket:562): Col index larger than the matrix col size 1651 > 1650 in line; 52548 terminate called after throwing an instance of 'char const*' Aborted (core dumped)

Could anyone tell me how to transform movielens-100k into what GraphChi can deal with?

Best, Allen

Comments

User 6 | 1/16/2015, 3:01:05 PM

Hi It seems the matrix market header you created is wrong - in the data a column number 1651 was found while you wrote there are only 1650 columns.

I recommend taking a look at GraphLab Create recommenders, see for example <a href="http://dato.com/learn/gallery/notebooks/basicrecommenderfunctionalities.html">here</a>. GraphLab Create is much easier to use and you don't need to work hard to format the input.


User 1205 | 1/16/2015, 4:24:15 PM

Hi Danny

I change the matrix header into "943 1651 80000", the results are as follows: [training] => [u1base] [validation] => [u1test] [minval] => [1] [maxval] => [5] [max_iter] => [6] [quiet] => [0] [D] => [100] [rbm_alpha] => [0.01] [rbm_beta] => [0.02] INFO: sharder.hpp(startpreprocessing:370): Starting preprocessing, shovel size: 17476266 INFO: io.hpp(computematrixsize:136): Starting to read matrix-market input. Matrix dimensions: 943 x 1650, non-zeros: 80000 FATAL: io.hpp(convertmatrixmarket:562): Col index larger than the matrix col size 1652 > 1651 in line; 52548 terminate called after throwing an instance of 'char const*' Aborted (core dumped)

I will try GraphLab Create instead torrow, but I still want to learn the input format of rbm-cf, could you show me how to deal with that?

Best, Allen


User 1205 | 1/17/2015, 1:52:15 AM

Hi Danny,

Thank you for your reply.

I want to do some experiments on movielens-100k through rbm-cf, could you show me how to use rbm-cf in GraphLab Create? I don't find this algorithm in GraphLab Create. And I also want to make it work in GraphChi. Could you show me how to deal with the dataset to make it available in GraphChi?


User 6 | 1/17/2015, 6:54:50 AM

RBM is not yet implemented in Graphlab Create. You should fix the matrix market header - the number of column should have the correct number of items.


User 1205 | 1/17/2015, 8:54:42 AM

Hi Danny,

Thank you for your reply. I used the instruction--"./toolkits/parsers/consecutivematrixmarket --file_list=list --csv=1" respectively to generate the matrix market header of training and test dataset, then wrote the output into "u1base" and "u1test". I think it should be the right way to get matrix header, is that right? Thank your for your tips.

Best, Allen


User 6 | 1/17/2015, 2:41:46 PM

please read the explanation here: http://bickson.blogspot.co.il/2012/02/matrix-market-format.html something is still wrong in your count of how many unique items are in column 2