Purpose of "linear_regularization" parameter in "matrix_factorization" recommender

User 362 | 6/17/2014, 11:50:51 PM

Hi all, I was trying to test the recommender with MatrixFactorizationModel. In the function create I looked into parameters specific to MatrixFactorizationModel. One of them is "linearregularization" whose purpose is to control "Regularization for linear term". I have several queries about this parameter: 1. Which linear term are we talking about in this context? 2. Also for recommender.create function we have two parameters "userdata" and "itemdata". What are there purpose (specifically in context of MatrixFactorizationModel)? 3. Do we have tutorials underlining the use of "lineraregularization" of MatrixFactorizationModel, "userdata" and "itemdata"? If not can we have one? 4. Also can anyone direct me to appropriate literature which will have discussion about linear_regularization for linear term in MatrixFactorizationModel?

Thankfully Mahmud


User 19 | 6/19/2014, 5:20:34 PM

Hi Mahmud,

For MatrixFactorizationModel, there are a few components to the prediction for user u and item i:

yui = \sumf Xuf betaf + \sumf Wif gammaf + \sumk thetauk phiik

where Xuf is a side feature value for user u and user feature f, and Wif is a side feature value for item i and item feature f. The last term is the classic matrix factorization term, and the "linearregularization" term applies regularization to the parameters involved int he first two terms, betaf and gamma_f. At the moment this is restricted to data sets with explicit feedback, e.g. ratings, etc.

We are about to release a tutorial that shows how one can use userdata and itemdata, so keep an eye out for that!

User 89 | 6/19/2014, 5:21:38 PM

Hello Mahmud,

I'll answer part of your questions now, and we'll be getting out a notebook soon that details the rest. (We can email you the draft immediately.)

The objective in the matrix factorization that we are using is the following:

Suppose we have users indexed as i \in {0, 1, ..., nu-1}; items indexed as j \in {nu, nu + 1, ..., nu + nv}; and side features having indices k \in {nu + nv + 0, nu + n_v + 1, ..., N}.

Using this indexing, we can list the linear features as a vector w of length N, so wi is the linear weight associated with user i and wj is the linear weight associated with item j. Similarly, denote the D-length latent factors for the user and items as Vi and Vj -- thus V is a matrix of size (nu + nv) by D.

With no side features, the current objective is

\sum((i, j, y) in observations) L (Vi' Vj + wi + wj, y) + (lambaw / 2) || w ||^2 + (\lambda_V / 2) ||V||^2,

where y is the centered rating for user i and item j. L(y', y) is the squared error loss function. \lambdaw is the linear regularization term and \lambdaV is the factor regularization term. || . || denotes the L2 norm.

With side features, denote xi as a vector over the side terms that is associated with user i, and let zj be a similar vector over the side terms associated with item j. Many of these terms can be zero. Thus xik is some value for feature k that occurs when user i is seen. (This is basically the factorization machine notation). Basically, one can think of this like a database join -- each time user i comes up, the side vector xi: is joined onto the observation vector.

Then the objective function is

\sum((i, j, y) in observations) L (Vi' Vj + wi + wj + \sumk xik *wk + \sumk zjk*wk, y) + (lambaw / 2) || w ||^2 + (\lambda_V / 2) ||V||^2,

We're planning on adding some functionality to this soon, with more control over the regularization for specific parts of the problem, so if you have specific needs in this regard, please let me know. I'll send you updates as specific.

Hope that helps! Let me know if it isn't clear.

Thanks! -- Hoyt