BPTF sample for GraphLab

User 471 | 7/17/2014, 12:48:53 AM

Hello,

I'm interested in tensor factorization using GraphLab. I'm trying to follow the instruction in the user manual (at the bottom of http://select.cs.cmu.edu/code/graphlab/pmf.html, copied below), however, the link for the sample data is broken.

Is the data file no longer available? If not, is there any sample file that can be used for tensor factorization?

Thanks, Kenji


Running example: BPTF (Bayesian monte carlo matrix factorization) using Twitter social graph This example was donated by Timmy Wilson @ smarttypes.org. It contains a twitter network of 68 followers, 11646 followies, 1 day and 15883 links. Download the input file here

<29|0>bickson@biggerbro:~/newgraphlab/graphlabapi/debug/demoapps/pmf$ ./pmf smarttypespmf 1 --scheduler="roundrobin(maxiterations=20,blocksize=1)" --float=true INFO: pmf.cpp(main:1260): PMF/ALS/SVD++/SGD Code written By Danny Bickson, CMU Send bug reports and comments to danny.bickson@gmail.com WARNING: pmf.cpp(main:1262): Code compiled with GLNOMULTEDGES flag - this mode does not support multiple edges between user and movie in different times Setting run mode BPTFMATRIX INFO: pmf.cpp(main:1309): BPTF_MATRIX starting

loading data file smarttypespmf Loading smarttypespmf TRAINING Matrix size is: USERS 68 MOVIES 11646 TIME BINS 1 Creating 15883 edges (observed ratings)... .loading data file smarttypespmfe Loading smarttypespmfe VALIDATION skipping file loading data file smarttypespmft Loading smarttypespmft TEST

Comments

User 6 | 7/17/2014, 12:53:31 AM

HI Kenji, You are looking at REALLY old version of the code, GraphLab v1, which is no longer supported. I guess you are interested in tensor factorization because of it handles additional dimension and thus may improve the accuracy of the factorization. I suggest taking a look at GraphLab Create where we have matrix factorization with side features, which is much easier to utilize and can handle multiple additional dimensions including user and item features. http://graphlab.com/products/create/docs/generated/graphlab.recommender.create.html#graphlab.recommender.create


User 471 | 7/17/2014, 6:36:12 PM

Hi Danny,

Thanks for the quick reply.

Correct me if wrong, but I was thinking that tensor factorization is more general and mathematically elegant approach, while matrix factorization with side features is an ad-hoc extension and it may miss some feature interaction.

What was the idea behind the decision to drop tensor factorization in GraphLab? Computational efficiency? Stability or convergence problem? I'm just curious.


User 2567 | 1/20/2016, 10:53:03 PM

Same question as @kenjiyamada "Correct me if wrong, but I was thinking that tensor factorization is more general and mathematically elegant approach, while matrix factorization with side features is an ad-hoc extension and it may miss some feature interaction.

What was the idea behind the decision to drop tensor factorization in GraphLab? Computational efficiency? Stability or convergence problem? I'm just curious."


User 1592 | 1/21/2016, 8:30:20 AM

Hi

Tensor factorization algorithms are hardly used on practice because of the exponential difficulty in implementing them in dimensions about or equal 3. I suggest reading the following paper: Tensor Decompositions, Alternating Least Squares and other Tales. P. Comon, X. Luciani and A. L. F. de Almeida. Special issue, Journal of Chemometrics. In memory of R. Harshman. August 16, 2009 which explains the alternating least square construction for d=3, for a larger dimension the implementation is simply not feasible in practice.

A much better approach (and every elegant trick deployed) is Steffen Rendle's work: Steffen Rendle (2010): Factorization Machines, in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), Sydney, Australia.

In practice, we did not test algorithms based on your mathematical beauty but based on their practicality: - are they easy to implement? - can they scale to really large datasets? - are they easy to tune? - are they easy to debug? - are they accurate enough? - do they converge quickly? - are they stable? - etc. etc.