Incorrect SVD results

User 581 | 8/11/2014, 11:31:24 PM

In the course of using the spectral clustering implementation, I noticed that I was getting some odd intermediate results for the SVD. After playing around with the SVD implementation on its own and trying to change the max_iter parameter, I'm still getting results that are wildly off on small examples. For example, I'll run

graphlab/release/toolkits/collaborativefiltering/svd testsvd2.csv --rows=5 --cols=5 --nsv=2 --nv=5 --maxiter=10 --predictions=testsvd2 --save_vectors=1

and get the singular values

cat testsvd2.singular_values %%GraphLab SVD Solver library. This file contains the singular values. 1.551999588782 1 0.9977049507311 0.7342006306533 2.210253479003e-17

whereas when I enter the matrix into R I get a different result for both the singular values and vectors

mat [,1] [,2] [,3] [,4] [,5] [1,] 1 0.000000 0.0000000 0.0000000 0.000000 [2,] 0 1.000000 0.1309110 0.2942640 0.308753 [3,] 0 0.130911 1.0000000 0.0110911 0.202304 [4,] 0 0.294264 0.0110911 1.0000000 0.774086 [5,] 0 0.308753 0.2023040 0.7740860 1.000000 svd(mat) $d [1] 2.0000000 1.0039934 1.0000000 0.7934823 0.2025243

$u [,1] [,2] [,3] [,4] [,5] [1,] 0.0000000 0.00000000 1 0.0000000 0.000000000 [2,] -0.4061238 0.18943681 0 -0.8939334 0.007762908 [3,] -0.1902863 0.92498499 0 0.2809812 -0.171007199 [4,] -0.6199773 -0.31774815 0 0.2083660 -0.686474996 [5,] -0.6438034 -0.08690562 0 0.2802076 0.706716602

$v [,1] [,2] [,3] [,4] [,5] [1,] 0.0000000 0.00000000 1 0.0000000 0.000000000 [2,] -0.4061238 0.18943681 0 -0.8939334 0.007762908 [3,] -0.1902863 0.92498499 0 0.2809812 -0.171007199 [4,] -0.6199773 -0.31774815 0 0.2083660 -0.686474996 [5,] -0.6438034 -0.08690562 0 0.2802076 0.706716602

Is there an explanation for this discrepancy?

I'm attaching the original input and additional output files from graphlab, in case that's helpful (I added .txt to end of each one to get the forum to let me upload them).

Thank you!

Comments

User 6 | 8/12/2014, 5:23:43 AM

Hi You have a mistake in preparing the input - the rows and cols should start from zero and not 1. The SVD results you give trim the last row/col.

Here is the fixed input file: 0 0 1.0 1 1 1.0 2 2 1.0 3 3 1.0 4 4 1.0 1 2 0.130911 2 1 0.130911 2 3 0.0110911 3 2 0.0110911 1 3 0.294264 3 1 0.294264 1 4 0.308753 4 1 0.308753 2 4 0.202304 4 2 0.202304 3 4 0.774086 4 3 0.774086

Alternatively, you can run with --inputfileoffset=1. I guess you may need to add this flag to the spectral clustering.

Here is the fixed run results:

./svd A3 --rows=5 --cols=5 --nsv=5 --nv=6 --maxiter=3 --quiet=1 --savevectors=1 --predictions=out GRAPHLABSUBNETID/GRAPHLABSUBNETMASK environment variables not defined. Using default values Subnet ID: 0.0.0.0 Subnet Mask: 0.0.0.0 Will find first IPv4 non-loopback address matching the subnet Loading graph. Loading graph. Finished in 0.005247 Finalizing graph. Finalizing graph. Finished in 0.041158 ========== Graph statistics on proc 0 =============== Num vertices: 10 Num edges: 17 Num replica: 10 Replica to vertex ratio: 1


Num local own vertices: 10 Num local vertices: 10 Replica to own ratio: 1 Num local edges: 17 Edge balance ratio: 1 Creating engine Running SVD (gklanczos) (C) Code by Danny Bickson, CMU Please send bug reports to danny.bickson@gmail.com Updates: 5 ... set status to tol set status to tol Updates: 5 Updates: 5 Number of computed signular values 6 Updates: 5 Updates: 5 Singular value 0 2 Error estimate: 1.64251e-15 Updates: 5 Updates: 5 Singular value 1 1.00399 Error estimate: 1.01041e-14 Updates: 5 Updates: 5 Singular value 2 1 Error estimate: 5.09373e-15 Updates: 5 Updates: 5 Singular value 3 0.793482 Error estimate: 1.93418e-15 Updates: 5 Updates: 5 Singular value 4 0.202524 Error estimate: 1.32544e-14 Saving singular value triplets to files: out.U. and out.V. Saving predictions


Final Runtime (seconds): 0.576169 Updates executed: 5 Update Rate (updates/second): 8.67801

As you can easily verify our accuracy matches R: [1] 2.0000000 1.0039934 1.0000000 0.7934823 0.2025243


User 581 | 8/12/2014, 6:14:46 PM

Thanks Danny. Adding the --inputfileoffset option does indeed fix things. I made a pull request to spectral_clustering.cpp with that change, in case that's helpful.


User 6 | 8/12/2014, 7:07:05 PM

Thanks I just merged your pull request.