Output in Ubuntu vs. Mac OS

User 451 | 7/7/2014, 6:55:55 PM

Hi, I tried running the same script that uses SVD from GraphLab on 2 OS: Linux Ubuntu 12.04 and Mac Mavericks 10.9.2 I got different results... which one is the correct? Additional Info: Linux, Ubuntu 12.04 - gcc 4.6.3 Mac, Mavericks 10.9.2 - g++ 4.2.1

I ran svd using the following command from the script (where all the parameters are from the script): ./svd "$outputdir"/"$dir" --rows=943 --cols=1682 --nsv="$nsvvalue" --nv="$nsvvalue" --maxiter="$maxitervalue" --quiet=1 --savevectors=1 --predictions="$outputdir"/outnsv"$nsvvalue"maxiter"$maxitervalue"

Example of results on Mac: 934 1411 0.013329618 807 1411 -0.076884895 846 1411 -0.13472639 194 1411 -0.023502461 246 1411 -0.061229785 279 1411 -0.12870842 305 1411 0.0924454 497 1415 -0.022377869

Results on the same data from Linux: 934 1411 0.26674098 807 1411 0.13379498 846 1411 0.67894632 194 1411 0.34583301 246 1411 -0.015906042 279 1411 0.042949241 305 1411 0.24515234 497 1415 -0.0019626303

Attached the full prediction results.

Thanks, Rada

Comments

User 6 | 7/7/2014, 7:00:30 PM

Hi Rada, Can you please provide the output of the two svd runs? As part of the output, we print the accuracy of each singular value triplet and how many triplets converged. For example, in our tutorial http://docs.graphlab.org/collaborative_filtering.html we get: Number of computed signular values 4 Singular value 0 2.16097 Error estimate: 1.05039e-15 Singular value 1 0.97902 Error estimate: 1.32491e-15 Singular value 2 0.554159 Error estimate: 9.92283e-16 Singular value 3 1.05388e-64 Error estimate: 3.42194e-16

I see one problem with your command line argument, is that nv should be larger than nsv. The number of vectors is the buffer we work with. You should try a few values of nv to see which gives you the best accuracy (vs. runtime). Also it is not clear what is max_iter?


User 451 | 7/7/2014, 7:22:26 PM

The full execution example: ./svd datasets/movielens/uitems0.data/uitems0.data --rows=943 --cols=1682 --nsv=5 --nv=5 --maxiter=3 --quiet=1 --savevectors=1 --predictions=outnsv5maxiter_3

I read that it is ok to set nv to be equal to nsv. max_iter is the parameter that allows that maximum iteration.

Attached both outputs with the singular values.

Thanks


User 6 | 7/7/2014, 7:25:55 PM

The bottom line is here: Singular value 0 510.514 Error estimate: 1.73079 Singular value 1 309.392 Error estimate: 2.36234

The error estimates are very bad (good estimates are for example smaller than 1e-8) that is why you get different results. Try to run with nv=15 and max_iter=10 and tol=1e-6


User 451 | 7/7/2014, 7:32:22 PM

Hi Danny,

I am trying to check how different configurations affect the results of the algorithm and not find the best configuration. I would like to know why and results are different between the 2 OS. I ran the same example on a third linux machine (with ubuntu) and the results between the 2 linux machines were identical. My question is why the results are different between the 2 OS and whether or not I can run the SVD on Mac too.

Thanks :) Rada


User 6 | 7/8/2014, 4:51:27 AM

Hi Rada, As you can see here: https://github.com/graphlab-code/graphlab/blob/master/toolkits/collaborative_filtering/svd.cpp#L87-L93 The SVD solver starts from a random state. The parameters you gave do not allow it to converge properly to the right result. We do not set the initial random number seed thus the two Ubuntus start from a default seed and thus have similar results. Anyway you should verify you allow the algorithm to converge to the right accuracy before comparing results...


User 451 | 7/8/2014, 7:37:49 AM

Hi Danny,

Thank you very much for your response! How can I allow the the algorithm to converge to the right accuracy?

Thanks!


User 6 | 7/8/2014, 8:09:44 AM

Try to run with nv=15 and max_iter=10 and tol=1e-6 and send me the output


User 451 | 7/8/2014, 8:31:00 AM

Attached the full output.

Singular value 0 510.514 Error estimate: 0.0317622 Singular value 1 193.111 Error estimate: 1.15006 Singular value 2 174.765 Error estimate: 0.926735 Singular value 3 153.801 Error estimate: 0.264399 Singular value 4 128.821 Error estimate: 2.22544 Singular value 5 120.499 Error estimate: 2.17384 Singular value 6 124.934 Error estimate: 1.38104 Singular value 7 113.98 Error estimate: 2.13314 Singular value 8 114.585 Error estimate: 1.81912 Singular value 9 114.851 Error estimate: 0.842978

Thanks :)


User 6 | 7/8/2014, 9:18:00 AM

Error estimates are still too high - please send me your input matrix and I will take a look at it...


User 451 | 7/8/2014, 9:23:31 AM

Hi, Attached the data.


User 6 | 7/8/2014, 2:51:48 PM

Hi Rada, I am running on your dataset using the command with nv=105 and get very good results Number of computed signular values 41 Singular value 0 510.514 Error estimate: 0.00195881 Singular value 1 193.111 Error estimate: 1.14221e-13 Singular value 2 174.765 Error estimate: 1.24491e-13 Singular value 3 130.751 Error estimate: 1.01898e-13 Singular value 4 128.796 Error estimate: 8.31967e-14


Final Runtime (seconds): 17.5418 Updates executed: 942 Update Rate (updates/second): 53.7002


User 451 | 7/9/2014, 6:19:33 AM

Hi Danny,

Great. Thank you very much! The error estimate is the deviation of the actual numbers from the computed using the singular value?

Thanks, Rada


User 6 | 7/9/2014, 7:05:24 AM

yes.

Following Slepc, the error measure is computed by a combination of: sqrt( ||Avi - sigma(i) ui ||2^2 + ||A^Tui - sigma(i) Vi ||2^2 ) / sigma(i)

Namely, the deviation of the approximation sigma(i) ui from Avi , and vice versa.

sigma(i) is the singular value, A is the matrix A^T is the matrix transposed, ui and vi are the matching singular vectors for sigma(i)


User 451 | 7/9/2014, 7:58:49 AM

Great!

Thanks :)