How to diagnose low precision in ranking factorization recommender?

User 1476 | 3/12/2015, 8:17:46 AM

Hi, I'm trying to create a recommender using Kiva's data dump with GraphLab Create 1.3.

The data contains pair-wise interactions of 'lender-loan' but with no rating, i.e. implicit. Each loan and lender has their own features.

I first tried item similarity recommender with only the pair-wise interaction data and got precision around 0.03-0.06. Then I switched to Ranking Factorization and tried different groups of features but ALWAYS got worse results around 0.0003. To recommend, the worse model will predict the same items for every user.

The parameter I used for ranking factorization is:

m = gl.recommender.rankingfactorizationrecommender.create(train, userid='lenderid', itemid='loanid', itemdata=loanfeature, userdata=lenderfeature, numfactors=20, regularization=0.1, binarytarget=True, max_iterations=20, solver = 'ials', verbose=True)

print m.evaluate(test, metric='precision_recall')

I think the result is not possible and always worse than traditional item-item collaborative filtering. And the absolute values of precision score are also very low. Is that normal? How can I diagnose the problem?

Thanks!!! Please tell me for any more code/info.

Comments

User 18 | 3/12/2015, 6:19:26 PM

Hmm, there are so many knobs to tweak with factorization recommenders that it's always a bit of a mystery when things don't work out. Here are a few things I would try:

  1. Set binary_target=False. You don't have a target column since the data is implicit, so this should not have an effect. But I just want to be sure.
  2. Use default regularization, or do a modelparametersearch over it.
  3. Set sidedatafactorization to False. This uses plain Matrix Factorization instead of the more general Factorization Machine. MF is easier to tune and sometimes more robust than FM. This should give you a good baseline performance for factorization-type models. (You may need to tweak regularization again.)
  4. Look through the side features and see if there are weird things: large value outliers? negative value outliers? heavy-tail distribution?
  5. If you have lots of numeric side features, try binning them into categorical variables using <a href="https://dato.com/products/create/docs/generated/graphlab.toolkits.featureengineering.featurebinner.FeatureBinner.html#graphlab.toolkits.featureengineering.featurebinner.FeatureBinner">FeatureBinner</a>. Factorization Machines are sometimes sensitive to numeric side features. They have an easier time with categorical variables.

Lastly, it helps for us to know more about the data. How many unique users, items, and observations? How many side features and what are their types?

Alice


User 1207 | 3/12/2015, 7:09:30 PM

To add to what Alice said, the behavior you are experiencing is likely due to your regularization value, 0.1, which is really high and enforces more similarity than I believe you want here.

The other thing to note is that you should try using a different solver than the IALS. Currently, our IALS implementation only considers the user and item terms and ignores all the side information, so the default solver will likely get much better results in this case. In other words, try something like:

<pre class="CodeBlock"><code>m = gl.recommender.rankingfactorizationrecommender.create(train, userid='lenderid', itemid='loanid', itemdata=loanfeature, userdata=lenderfeature, numfactors=20, regularization=1e-9 maxiterations=20, verbose=True)</code></pre>

Let us know if that helps.


User 1476 | 3/18/2015, 8:16:33 PM

Thank you for the answers.

The data contains over 800,000 loans and around 1.6M lenders. Yes, the data is super sparse (median of # of loans per lender is 1).

I tried tuning the parameters as suggested, standardized numeric features, and selected only lenders who have over 5000 loans, which significantly reduce the sample size. But the recall didn't improve.

It's quite weird because FM is supposed to do well with sparse data.


User 18 | 3/18/2015, 9:18:02 PM

The sparsity of the data may be the key problem here. Sounds like you have a lot of lenders with only one loan. Any recommender algorithm would have a hard time dealing with that, because all collaborative filtering methods rely on the "collaborative" part, i.e., multiple users rate an item, and each user rates multiple items. Similarly, loans with only one lender is also a problem. In both cases, the dataset contains no correlation information to link this user or item to the rest.

Sometimes these problems can be ameliorated with side features. I'm still not sure why FM is doing worse than item similarity, but let's debug the sparsity problem first.

Can you try removing the number of loans with just one lender, as well as the lenders with only one loan? This should really be done iteratively, since after the first removal, you might discover more lenders with now only one loan. Try the <a href="https://dato.com/products/create/docs/generated/graphlab.kcore.create.html#graphlab.kcore.create">K-core algorithm</a>. This requires you to first create the SGraph to represent the bipartite loan-lender graph. Set kmin=3, meaning that you want to retain only the loans and lenders linking to at least 3 other popular lenders and loans. Try the recommender on this new dataset.

If this still doesn't improve the recall and precision, then try looking at the side data to see if the loans and lenders can be grouped or clustered according to attributes. Then run recommender on loan and lender groups instead of individuals.

Keep us posted about what you find!