User 547 | 7/29/2014, 6:47:24 PM

Hi,
i have a matrix market with the heading 62k 70million 600million, i need to run wals, and rbm on this matrix. i tried wals with graphlab but the rank/dimension of the factorized U and V is very limited, i.e. when i increase the dimension the memory ran out. so i tried with the graphchi: ./toolkits/collaborative*filtering/wals --training=time*smallnetflix --validation=time*smallnetflixe --lambda=0.065 --minval=1 --maxval=5 --max*iter=6 --quiet=1
and it seems i can't define the dimension, so i'm wondering whether there is a way to tune the dimension for the wals.

the same problem with RBM in graphchi, are the parameters hard coded, is there any way to tune the alpha, beta and D?

also, when i run wals (the weight col was added) and rbm on the following matrix in your web, %%MatrixMarket matrix coordinate real general % Generated 27-Feb-2012 3 4 9 1 1 0.8147236863931789 2 1 0.9057919370756192 3 1 0.1269868162935061 2 2 0.6323592462254095 3 2 0.09754040499940952 1 3 0.2784982188670484 2 3 0.5468815192049838 3 3 0.9575068354342976 2 4 0.1576130816775483 each row in U and V matrices has different number of cols which basically is the dimension or the feature vectors of user/item, below is the #of cols per row in U and V matrices: 5 17 2 60 60 60 so, i'm wondering whether this result is correct, and how can i get the same number of features/dimension for each row?

Thanks, Tara

User 547 | 7/30/2014, 2:26:16 AM

Hi,

please ignore the question regarding the rbm alpha and beta parameters, i figured out that part, i.e. i could tune the rbm parameters within the command line and it works fine. but still have the problem with the dimension of U and V output/factorized matrices.

Thanks, Tara

User 6 | 7/30/2014, 5:18:49 AM

Hi Tara, You can tune the feature vector width with --D=XX in both cases. Did you look at the output? The reason you have different number of items in the first 3 columns is that because we write the matrix market header format. If you like to ignore the header you are welcome to remove the first 3 lines. %%MatrixMarket matrix array real general // 5 words %This file contains WALS output matrix U. In each row D factors of a single user node // 17 words rows cols // 2 words //60 values on each row (I guess D=60 in your case)

User 547 | 7/30/2014, 8:21:26 PM

Hi Danny.

Thanks for your reply. I ran it with following command and got a double free error:

COMMAND: rbm --training=/test/RBM/A*try2*A*d*200*bin*10*iter*5/A.mm --rbm*mult*step*dec=0.9 --rbm*bins=10 --rbm*alpha=0.1 --rbm*beta=0.065 --minval=1 --maxval=10 --max_iter=5 --D=200

where A.mm contains {1.0, 2.0... 10.0} in third column. Where the matrix has header:

%%MatrixMarket matrix coordinate real symmetric % Generated 27-Jul-2014 41508 13327560 39134623 1 1 1.0 1 2 1.0 ...

Top memory usage was 201G. Using --D=30 gives similar error. I have pushed it through valgrind and it points out memory leaks (at least on a smaller run - however, I am not sure if my parameters are sane - not in terms of results, but in terms of what graphchi assumes about them). Let me know if you want the complete log for the run and/or the valgrind log.

By the way I get DEBUG messages in from my build, does that mean the build is with some DEBUG macros set which I need to turn off to get better performance.

Here's the error (complete log is attached):

...
=== REPORT FOR rbm-inmemory-factors() ===
[Numeric]
cachesize*mb: 0
compression: 1
execthreads: 64
loadthreads: 4
membudget*mb: 800
niothreads: 2
niters: 5
nshards: 2
nvertices: 1.33691e+07
scheduler: 0
stripesize: 1.07374e+09
updates: 6.68453e+07
work: 3.91346e+08
[Timings]
blockload: 0.712516s (count: 740, min: 5e-06s, max: 0.068947, avg: 0.000962859s)
commit*thr: 4.58335s (count: 375, min: 0.003707s, max: 0.113703, avg: 0.0122223s)
execute-updates: 1857.35s (count: 20, min: 0.04393s, max: 941.387, avg: 92.8675s)
iomgr*init: 0.000307 s
memoryshard*create*edges: 9.3145s (count: 20, min: 0.127974s, max: 1.03218, avg: 0.465725s)
memshard*commit: 0.561321s (count: 10, min: 0.000409s, max: 0.247925, avg: 0.0561321s)
preada*now: 4.96071s (count: 810, min: 3e-06s, max: 0.885781, avg: 0.00612433s)
read*next*vertices: 6.558s (count: 20, min: 5e-06s, max: 1.92532, avg: 0.3279s)
runtime: 1884.24 s
stripedio*wait*for*reads: 0.131038s (count: 20, min: 3e-06s, max: 0.050334, avg: 0.0065519s)
stripedio*wait*for*writes: 3.8e-05s (count: 15, min: 0s, max: 5e-06, avg: 2.53333e-06s)
[Other]
app: rbm-inmemory-factors
engine: default
file: /test/RBM/A*try2*A*d*200*bin*10*iter*5/A.mm
INFO: stripedio.hpp(~block_cache:170): Cache stats: hits=0 misses=1125
ESC[0mINFO: stripedio.hpp(~block_cache:171): -- in total had 0 MB in cache.
ESC[0m*** glibc detected *** /bin/GraphChi*25July2014/graphchi-cpp/toolkits/collaborative*filtering//rbm: double free or corruption (
!prev): 0x000000001b9734a0 ***

-------------COMPLETE LOG------------------------

User 547 | 7/30/2014, 8:49:48 PM

Hi Danny, i think i realized my mistake, let me retry and let you know if it works.

Thanks, Tara

User 547 | 7/30/2014, 10:15:24 PM

Hi Danny,

the attached is the log of the second attempt. it is segfault now. please let me know what you think. Thanks, Tara

User 6 | 7/31/2014, 8:18:59 AM

This is strange since the program finished running and wrote the output file and only when exiting it segfault. I have no clue why. Which OS are you running on? Is this a virtual box?

User 547 | 7/31/2014, 5:05:16 PM

Hi Danny, No, it is not a virtual box. i'm running it on CentOS Linux while connecting to a server. I also ran the wALS with D=50 and iter=3 and got the same result, the log is attached here.

Thanks, Tara

User 6 | 7/31/2014, 7:13:40 PM

The strange thing is that again the program finished fine. Only when it exists the segfault happens.
I suggest compiling in debug mode (make clean; make cfd) and trying again maybe there will be a clearer error message. If not, you should run from within gdb as follows:
1) compile in debug
2) gdb ./toolkits/collaborative*filtering/wals
3) run --training=time*smallnetflix --validation=time*smallnetflixe --lambda=0.065 --minval=1 --maxval=5 --max*iter=6 --quiet=0
4) when the program segfaults send us the output of the "where" command applied at the gdb prompt. Also send us the full output. And run with --quiet=0 to have more traces.

User 557 | 8/1/2014, 8:04:51 PM

Hi, Danny.

It runs fine with your netflix commandline. I think I sorted out what the problem was:

In the table at: http://bickson.blogspot.ca/2012/12/collaborative-filtering-with-graphchi.html , there seems to be a typo:

--rbm_bins=XX Total number of binary bins used. For example in Netflix data where we have 1,2,3,4,5 the number of bins is 5

That should be 6 (and it's 6 at other places in the post - I should have been more careful).

In my data as integer values for rating range from 1 to 10 and I was using --rbm_bins as 10 (incorrectly I think). In the debug version, I get the following assertion failure with this command line:

$BIN/graphchi*debug/graphchi-cpp/toolkits/collaborative*filtering/rbm --training=$OUT/test/RBM/double*check*A*munin*check*bin*10*30/A.mm --rbm*mult*step*dec=0.9 --rbm*bins=10 --rbm*alpha=0.1 --rbm*beta=0.065 --minval=1 --maxval=10 --max*iter=5 --D=30

rbm: rbm.cpp:248: virtual void RBMVerticesInMemProgram::update(graphchi::graphchi*vertex<vertex*data, float>&, graphchi::graphchi*context&): Assertion `r < rbm*bins' failed.
./RBM.sh: line 39: 2139 Aborted (core dumped) $CMD

return:134

So I reran with --rbm_bins as 11, and now it goes through fine. It also does through with --D=100 which is great. I will rebuild without debug and make sure this wasn't a fluke. Assuming it runs, I will look into validating it.

$BIN/graphchi*debug/graphchi-cpp/toolkits/collaborative*filtering/rbm --training=$OUT/test/RBM/double*check*A*munin*check*bin*11*30/A.mm --rbm*mult*step*dec=0.9 --rbm*bins=11 --rbm*alpha=0.1 --rbm*beta=0.065 --minval=1 --maxval=10 --max*iter=5 --D=30
...
INFO: rbm.cpp(rbm*init:371): RBM initialization ok
...
INFO: rbm.cpp(output*rbm*result:355): RBM output files (in matrix market format): $OUT/test/RBM/double*check*A*munin*check*bin*11*30/A.mm*U.mm, $OUT/test/RBM/double*check*A*munin*check*bin*11*30/A.mm*V.mm
...
INFO: stripedio.hpp(~block*cache:170): Cache stats: hits=0 misses=1125
INFO: stripedio.hpp(~block_cache:171): -- in total had 0 MB in cache.
return:0

$BIN/graphchi*debug/graphchi-cpp/toolkits/collaborative*filtering/rbm --training=$OUT/test/RBM/double*check*A*munin*d*100*retry*d*100*bins*11*100/A.mm --rbm*mult*step*dec=0.9 --rbm*bins=11 --rbm*alpha=0.1 --rbm*beta=0.065 --minval=1 --maxval=10 --max*iter=5 --D=100
...
INFO: rbm.cpp(rbm*init:371): RBM initialization ok
...
INFO: rbm.cpp(output*rbm*result:355): RBM output files (in matrix market format): ...
$OUT/test/RBM/double*check*A*munin*d*100*retry*d*100*bins*11*100/A.mm*U.mm, $OUT/test/RBM/double*check*A*munin*d*100*retry*d*100*bins*11*100/A.mm*V.mm
...
INFO: stripedio.hpp(~block*cache:170): Cache stats: hits=0 misses=1125
INFO: stripedio.hpp(~block_cache:171): -- in total had 0 MB in cache.
return:0

Do you think the incorrect value for --rbm_bins was the root of this?

Thanks for your help... it's appreciated :)

User 6 | 8/3/2014, 5:27:48 AM

Thanks for the clarification! I have fixed the documentation.

User 547 | 8/5/2014, 2:44:12 AM

Hi again, I ran wALS several times on my data, but the U and V outputs contain only 0 values, the log is attached and i would be grateful if you let me know what could be the possible reason for that.

Also, regarding the updated version of RBM documentation below, the number of bins shouldn't be 0,1,2,3,4,5 rather that 1 to 5? --rbm_bins=XX Total number of binary bins used. For example in Netflix data where we have 1,2,3,4,5 the number of bins is 6

Thanks, Tara

User 6 | 8/5/2014, 6:40:56 AM

Hi, 1) Regarding RBM, in the Netflix example, there are 6 bins, but only 5 of them are used. 2) Regarding WALS, there is no enough information to debug. I suggest started from a small example to verify it works. You should give an input file which has 4 columns in the format [user id] [item id] [weight] [rating] verify that the weight is not zero.

I recommend trying out GraphLab Create, we have there matrix factorization with side features in case you have additional information about the user (like age, zip code etc.), additional information about the item (like weight, color, price etc.), and additional information about the rating (like time of rating etc.). I suggest trying both matrix factorization with side features and factorization machines, those are two superior algorithms to RBM and WALS since they take the additional information into account.

User 547 | 8/5/2014, 9:56:27 PM

OK, thanks for your comments.
the wALS ran successfully on: ./toolkits/collaborative*filtering/wals --training=time*smallnetflix --validation=time*smallnetflixe --lambda=0.065 --minval=1 --maxval=5 --max*iter=6 --quiet=1

but, i still have the same problem running it on my matrix, checking the "training=time_smallnetflix" i noticed that the weight is between 1 and 27, and i'm wondering how you've assigned those values i.e. did you normalize the rating or some other calculation to obtain the weights? as in my case, i actually don't have weight column and i added a column of "1" to all matrix rows to do the wALS on it. so, do you think it may be the cause of getting "0" values in my "U" and "V" matrices?

Thanks, Tara

User 6 | 8/6/2014, 6:03:06 AM

If you do not have the weight values, running WALS will not improve the output.