GraphLab | The Five-Line Recommender, Explained

User 2 | 3/15/2014, 10:33:39 PM

        <div class="EmbeddedContent"><img src="http://graphlab.com/images/GLlogo_FU_STACKED_300.png" class="LeftAlign" /><strong>GraphLab | The Five-Line Recommender, Explained</strong>
           <p>Building a recommender system is easy with GraphLab Create. Simply import graphlab, load data, create a recommender model, and start making recommendations. Let's walk through this line by line.</p>
           <p><a href="http://graphlab.com/learn/notebooks/five_line_recommender.html">Read the full story here</a></p>
           <div class="ClearFix"></div>
        </div>

Comments

User 151 | 3/15/2014, 10:33:39 PM

Thanks for this great example! One questions, I follow these 5 steps exactly but get different recommendation each time I executed it. Is any randomization coefficient (or hyper parameter) involved? How to fix it?

output-1 user item score rank 0 Jacob Smith All Quiet on the Western Front 8.297275 0 1 Jacob Smith The City of Lost Children 7.597459 1 2 Jacob Smith The Grapes of Wrath 7.113951 2 3 Jacob Smith The Seventh Seal 7.110005 3 4 Jacob Smith Malcolm X 7.091480 4 5 Mason Smith Ronin 6.951424 0 6 Mason Smith True Grit 6.471976 1 7 Mason Smith Tabu: A Story of the South Seas 6.359626 2 8 Mason Smith The Entertainer 6.359424 3 9 Mason Smith The Town is Quiet 6.359424 4

Regards, Wenfeng


User 14 | 3/17/2014, 4:57:59 PM

Hi Wenfeng,

Thanks for your feedback. The default recommender model -- matrixfactorization -- does random parameter initialization which explains the non-determinstic recommendation you get. The create method does have a "randomseed" option, however, it only controls the randomness in train/validation. We will get this fixed.

In the mean time, there are other models which are deterministic, e.g. itemsimilarity. Please replace step 3 with: model = graphlab.itemsimilarity.create(data, user="user", item="movie", target="rating"), and you should expect to get a stable recommendation in step 4.

Thanks! -jay


User 9 | 3/20/2014, 5:24:33 PM

@"Wenfeng Wang"‌ - thanks again for your feedback. We have made the random_seed fix Jay mentioned above. It will be included in our next release (targeting mid April).


User 151 | 3/20/2014, 7:29:52 PM

Thanks Jay & Timmuss for quick response. If I understand correctly, the random_seed option is to get a randomly training sample (the rest for validation)?


User 14 | 3/20/2014, 8:02:36 PM

Right. randomseed currently only affects the "holdoutprobability" option in the create function. The option will let you specify the fraction of data used for training, and the rest is used for validation.


User 379 | 6/19/2014, 7:32:37 PM

In API docs I can't find a load function for saved models. For example, how can we load this model saved in "my_model" later?


User 18 | 6/20/2014, 7:03:42 AM

Ah, load_model() is under utilities: http://graphlab.com/products/create/docs/graphlab.toolkits.html#utilities

It works for any model from any toolkit, so it got placed in utilities instead of within a model or a toolkit.


User 416 | 7/1/2014, 8:10:12 PM

I've been taking all of the tutorials for Graphlab Recommender systems, and I was curious to know if there is a current framework that can handle item-based recommendations. From the models available in the API, it seems as though this might be possible with either the Item Means Model or the Item Similarity Model. If so, how would I go about building an item-based recommender? That is, instead of saying "user bought item A so we recommend items B and C for him/her", is it possible to say "item A exists so we recommend items B and C go well with it", even though it is trained from the same user-bought-this-item pair (more of a "frequently bought together" kind of list)? Perhaps this might even just be a matter of what I'm printing, but it seems to be user-specific.


User 18 | 7/1/2014, 9:24:59 PM

If I understand your question, you're asking for a way to get the most similar items for a given item, is that right? This functionality will be available in the upcoming release (due out within a month).


User 379 | 7/21/2014, 5:21:08 PM

Is it possible to save model as text? model.save will save binary files by default.


User 14 | 7/22/2014, 7:24:06 AM

Hey. What do you want to achieve by saving model as text file? Text format is probably not the best format for storing models as there might be a lot of parameters. However, I can imagine it being useful if the saved model contains a header file containing hyperparameters and training summary statistics. Do you think it will be useful to solve your problem?


User 379 | 7/22/2014, 2:24:26 PM

Hi, Yes I want to extract model parameters.


User 14 | 7/22/2014, 3:49:37 PM

Some model parameters are store as numbers, where others are stored in SFrames. In the later case, you can extract the SFrame using m.get() call and than using sframe.save(format='csv') to save it as text.


User 379 | 7/22/2014, 4:15:17 PM

Can I use m.get() to extract latent factors for users and items from the model?


User 14 | 7/22/2014, 5:17:48 PM

Yes, querying for latent factors from matrix factorization model is available in the latest version 0.9. m.keys() should list all the fields that you can query for a model.


User 18 | 7/24/2014, 11:54:54 PM

@milad621, m['coefficients'] will return a dictionary of all of the learned parameters in a matrix factorization model, including the latent factors.


User 722 | 9/15/2014, 3:24:38 PM

Good work. I wish there would be more "Graph Labs" in "Big Data" ... the whole approach removes mountains of obfuscation. Where I can get the architecture of Graph Lab system?

Thanks ...


User 9 | 9/15/2014, 4:10:18 PM

@dbjdbj‌ We are really happy to hear GraphLab Create is helping you!

Please see this page for a high-level architecture diagram. If you would like more details, just let us know what you need.

http://graphlab.com/products/create/technology.html