Question Regarding graphchi and SVD

User 3027 | 1/14/2016, 2:36:39 PM

Came across your toolkit and am very interested in using it for large scale recommendations. I am having a few issues and hoping you can be of help. As per the following link (which i am learning from) i am utilizing the following matrix from the following blog and use SVD to fill in recommendations for missing values in the matrix I.e. as per the link http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/comment-page-1/#comment-112170 I want to apply the SVD via graphchi to convert the following matrix

  D1    D2  D3     D4

U1 5 3 - 1 U2 4 - - 1 U3 1 1 - 5 U4 1 - - 4 U5 - 1 5 4 to the following matrix D1 D2 D3 D4 U1 4.97 2.98 2.18 0.98 U2 3.97 2.40 1.97 0.99 U3 1.02 0.93 5.32 4.93 U4 1.0 0.85 4.59 3.93 U5 1.36 1.07 4.89 4.12 As you can see [U1, D3] is 2.18

Now utilizing graphchi i created the following file %%MatrixMarket matrix coordinate real general % Generated 13-Jan-2016 5 4 13 1 1 5 2 1 4 3 1 1 4 1 1 1 2 3 3 2 1 5 2 1 5 3 5 1 4 1 2 4 1 3 4 5 4 4 4 5 4 4

And ran the following command: toolkits/collaborativefiltering/svd --training=A3 --nsv=5 --nv=5 --maxiter=4 --quiet=1

The output for U is A3_U.mm %%MatrixMarket matrix array real general %This file contains SVD output matrix U. In each row nconv factors of a single user node. 5 5 -4.368959271818e-01 6.692412459375e-01 -2.962775079327e-01 -4.863747510406e-01 -1.920868121222e-01 -2.971749833499e-01 4.430872669496e-01 -5.015708170233e-02 7.959112293175e-01 2.817273244459e-01 -5.158972824760e-01 -1.363151756024e-01 5.489319278056e-01 -2.861220280729e-01 5.762604363667e-01 -3.999963523688e-01 -1.107738227289e-01 4.834938547033e-01 2.056927077123e-01 -7.427356735392e-01 -5.428276799395e-01 -5.700325970518e-01 -6.120550110472e-01 7.608949757941e-02 -5.898059818321e-17

And for V is A3_V.mm %%MatrixMarket matrix array real general %This file contains SVD output matrix V. In each row nconv factors of a single item node. 4 5 -4.748899799195e-01 7.820302508374e-01 -1.721237898201e-01 3.650775187891e-01 -2.206533861859e-17 -2.623434753482e-01 2.089135611270e-01 -2.522424693189e-01 -9.076919970956e-01 -3.013566671523e-18 -3.005118049473e-01 -4.575447180403e-01 -8.108900572877e-01 2.068883782932e-01 7.456507311408e-18 -7.844412425425e-01 -3.680171758844e-01 4.992038187373e-01 3.292811970395e-03 1.161434994292e-17

As i understand U * transpose(V) should fill in the middle values and if i try and reconstruct [U1,D3] by multiplying [-4.368959271818e-01 6.692412459375e-01 -2.962775079327e-01 -4.863747510406e-01 -1.920868121222e-01 ] with [-3.005118049473e-01 -4.575447180403e-01 -8.108900572877e-01 2.068883782932e-01 7.456507311408e-18] and then add them up

Its no where around the same value produced by the algorithm in the blog (maybe ALS). Appreciate your help, (PS: i am running graphchi on Mac Air OS X)

Comments

User 1592 | 1/14/2016, 4:00:10 PM

Hi, I would separate your problem into two 1) understanding the math 2) understanding how to use some code to compute the math

Regarding (1), the blog post you mention does not explain SVD but just a simple matrix factorization A = UV'. SVD (See wikipedia https://en.wikipedia.org/wiki/Singularvaluedecomposition) computes two matrices, but also a diagonal matrix UDV' you need to multiply to get the prediction. The matching GraphCHi code is here: https://github.com/GraphChi/graphchi-cpp/blob/master/toolkits/collaborative_filtering/svd.cpp#L79. You can include another matrix market file given using the --test=filename flag to compute the prediction correctly.

Regarding (2) please note that we have newer code in GraphLab Create, it is very recommended to switch to our newer codebase. You have there an implementation of the ALS and SGD algorithms, as documented here: https://dato.com/products/create/docs/generated/graphlab.recommender.factorizationrecommender.create.html#graphlab.recommender.factorizationrecommender.create. Those algorithms can compute the exact task at hand but way more easily.


User 3027 | 1/14/2016, 4:27:07 PM

Thanks for the input, agreed missing D here as per my reading of the SVD algo. Isn't this generated by the toolkit itself. Btw i am using the latest code base but was following the blog @ http://bickson.blogspot.com/2012/12/collaborative-filtering-with-graphchi.html to understand how to apply. As part of running grapchi from commandline i only have 2 *.mm files XU.mm and XV.mm. What file contains the D matrix? Will look at the second link but seems like python implementation and at some point i will have a very large file so would like to understand the usage.


User 1592 | 1/15/2016, 8:06:18 AM

Hi The python is a thin interface, we have example scaling up to 100,000,000,000 non zeros so you should try it out we have very good performance. You can read an explanation about the architecture and performance here: http://blog.dato.com/how-fast-are-out-of-core-algorithms

Regarding svd in GraphChi, as seen here: https://github.com/GraphChi/graphchi-cpp/blob/master/toolkits/collaborativefiltering/svd.cpp#L429 the output of the singular values vector is written to a file name named yourinputfilename.singular_values


User 3027 | 1/15/2016, 2:58:35 PM

Thanks Danny, btw figured out the singular matrices, however UDV' multiplication for the recommendation for element [U1, D3] returns zero. Also from the commandline how can i generate the recommendations. -testfile=x actually returns the original matrix. Do have a basic understanding of the math, trying to see how to utilize the commandline. Still working through python wrapper but since i have come far down this path, would like to see it work too. Regards,


User 1592 | 1/15/2016, 3:33:36 PM

Hi The test file should include the non zero entries of the matrix you want to compute the missing values. In your case a test file which will compute the single entry [1,3] is

%%MatrixMarket matrix coordinate real general 5 4 1 1 3 1


User 3027 | 1/15/2016, 6:22:29 PM

As you can see my first step is understand the results compared to the other blog, but expect differences via algo and implementation. With the Matrix Market file above and the following cmd line option toolkits/collaborativefiltering/svd --training=A3 --nsv=5 --nv=5 --maxiter=4 --quiet=1 --test=out out.predict is as follows

%%MatrixMarket matrix coordinate real general %This file contains predictions of user/item pair, one prediction in each line. The first column is user id. The second column is the item id. The third column is the computed prediction. 5 4 1 1 3 -3.3306691e-16

So the value predicted is around '0'. Like to understand if this is expected behavior, regards


User 3027 | 1/15/2016, 8:04:41 PM

Also the following file produces a seg fault %%MatrixMarket matrix coordinate real general % Generated 15-Jan-2016 4 4 16 1 1 .1 2 1 .2 3 1 .3 4 1 .4 1 2 .1 2 2 .2 3 2 .3 4 2 .4 1 3 .1 2 3 .2 3 3 .3 4 3 .4 1 4 .1 2 4 .2 3 4 .3 4 4 .4

toolkits/collaborativefiltering/svd --training=A5 --nsv=3 --nv=4 --maxiter=2 --quiet=1 WARNING: common.hpp(print_copyright:214): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to danny.bickson@gmail.com [training] => [A5] [nsv] => [3] [nv] => [4] [max_iter] => [2] [quiet] => [1] Load matrix A5 Starting iteration: 1 at time: 0.070909 Starting step: 1 at time: 0.109074 Starting step: 2 at time: 0.173486 Starting step: 3 at time: 0.286066 set status to tol set status to tol Number of computed signular values 4 Singular value 0 2.52489 Error estimate: 4.65347e-16 Singular value 1 0.715207 Error estimate: 6.73988e-16 Singular value 2 0.491065 Error estimate: 1.68477e-15 Going to save output vectors U and V Segmentation fault: 11


User 1592 | 1/20/2016, 1:48:24 PM

Hi It seems there is a bug in GraphCHi when saving the matrix V to file. It may be related to the fact that the matrix is square. As far as I remember we do not support square matrices. When I add a row, we get the following which works fine

%%MatrixMarket matrix coordinate real general % Generated 15-Jan-2016 4 5 17 1 1 .1 2 1 .2 3 1 .3 4 1 .4 1 2 .1 2 2 .2 3 2 .3 4 2 .4 1 3 .1 2 3 .2 3 3 .3 4 3 .4 1 4 .1 2 4 .2 3 4 .3 4 4 .4 1 5 3


User 1592 | 1/20/2016, 1:55:24 PM

p.s. Regarding the prediction, your result is correct it is around 0. Note that you are comparing two different algorithms - matrix factorization vs. SVD and thus the results should not be the same anyway.


User 3027 | 1/20/2016, 4:13:15 PM

Thanks Danny, for matrix factorization (for predictions) what would you suggest we use.