How to get decent results on netflix and Yahoo music data by using Graphlab Create

User 1802 | 4/24/2015, 12:16:39 AM

I am using graphlab create for building a recommender system on netflix competition data and yahoo music data. But I can not find good estimates. Am I wrong somewhere? THanks! Following are logs while running netflix data.

PROGRESS: Preparing data set. PROGRESS: Data has 99072112 observations with 480189 users and 17770 items. PROGRESS: Data prepared in: 44.4499s PROGRESS: Training factorizationrecommender for recommendations. PROGRESS: +--------------------------------+--------------------------------------------------+----------+ PROGRESS: | Parameter | Description | Value | PROGRESS: +--------------------------------+--------------------------------------------------+----------+ PROGRESS: | numfactors | Factor Dimension | 1024 | PROGRESS: | regularization | L2 Regularization on Factors | 0.005 | PROGRESS: | solver | Solver used for training | sgd | PROGRESS: | linearregularization | L2 Regularization on Linear Coefficients | 1e-10 | PROGRESS: | maxiterations | Maximum Number of Iterations | 15 | PROGRESS: +--------------------------------+--------------------------------------------------+----------+ PROGRESS: Optimizing model using SGD; tuning step size. PROGRESS: Using 12384014 / 99072112 points for tuning the step size. PROGRESS: +---------+-------------------+------------------------------------------+ PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value | PROGRESS: +---------+-------------------+------------------------------------------+ PROGRESS: | 0 | 0.000148746 | Not Viable | PROGRESS: | 1 | 3.71864e-05 | 1.09998 | PROGRESS: | 2 | 1.85932e-05 | 1.12131 | PROGRESS: | 3 | 9.2966e-06 | 1.13869 | PROGRESS: | 4 | 4.6483e-06 | 1.15098 | PROGRESS: +---------+-------------------+------------------------------------------+ PROGRESS: | Final | 3.71864e-05 | 1.09998 | PROGRESS: +---------+-------------------+------------------------------------------+ PROGRESS: Starting Optimization. PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+ PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size | PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+ PROGRESS: | Initial | 226us | 1.17631 | 1.08458 | | PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+ PROGRESS: | 1 | 26.77s | DIVERGED | DIVERGED | 3.71864e-05 | PROGRESS: | RESET | 32.97s | 1.1763 | 1.08457 | | PROGRESS: | 1 | 59.77s | DIVERGED | DIVERGED | 1.85932e-05 | PROGRESS: | RESET | 1m 5s | 1.1763 | 1.08457 | | PROGRESS: | 1 | 1m 28s | 1.11053 | 1.05382 | 9.2966e-06 | PROGRESS: | 2 | 1m 51s | 1.07658 | 1.03758 | 1.65858e-06 | PROGRESS: | 3 | 2m 12s | 1.07262 | 1.03567 | 9.10509e-07 | PROGRESS: | 4 | 2m 35s | 1.07048 | 1.03464 | 6.27491e-07 | PROGRESS: | 5 | 2m 57s | 1.06903 | 1.03394 | 4.78696e-07 | PROGRESS: | 6 | 3m 20s | 1.06794 | 1.03341 | 3.86942e-07 | PROGRESS: | 7 | 3m 42s | 1.06707 | 1.03299 | 3.24704e-07 | PROGRESS: | 8 | 4m 5Markdown`�I�M! ��7# ++����FYI: If you are using Anaconda and having problems with NumPyHello everyone,

I ran into an issue a few days ago and found out something that may be affecting many GraphLab users who use it with Anaconda on Windows. NumPy was unable to load, and consequently everything that requires it (Matplotlib etc).

It turns out that the current NumPy build (1.10.4) for Windows is problematic (more info here).

Possible workarounds are downgrading to build 1.10.1 or forcing an upgrade to 1.11.0 if your dependenci

Comments

User 1190 | 4/24/2015, 12:31:31 AM

Hi @ziqiliu

The default setting works pretty well for me on netflix.

Can you try the following code? import graphlab as gl sf = gl.SFrame.read_csv('netflix_mm', delimiter=' ', header=False) m = gl.recommender.create(sf, user_id='X1', item_id='X2', target='X3')


User 1802 | 4/24/2015, 1:23:12 AM

Hi Jay Gu

Thanks a lot. It works pretty good.